Skip to main content

Using machine learning involving diagnoses and medications as a risk prediction tool for post-acute sequelae of COVID-19 (PASC) in primary care

Abstract

Background

The aim of our study was to determine whether the application of machine learning could predict PASC by using diagnoses from primary care and prescribed medication 1 year prior to PASC diagnosis.

Methods

This population-based case–control study included subjects aged 18–65 years from Sweden. Stochastic gradient boosting was used to develop a predictive model using diagnoses received in primary care, hospitalization due to acute COVID- 19, and prescribed medication. The variables with normalized relative influence (NRI) ≥ 1% showed were considered predictive. Odds ratios of marginal effects (ORME) were calculated.

Results

The study included 47,568 PASC cases and controls. More females (n = 5113) than males (n = 2815) were diagnosed with PASC. Key predictive factors identified in both sexes included prior hospitalization due to acute COVID- 19 (NRI 16.1%, ORME 18.8 for females; NRI 41.7%, ORME 31.6 for males), malaise and fatigue (NRI 14.5%, ORME 4.6 for females; NRI 11.5%, ORME 7.9 for males), and post-viral and related fatigue syndromes (NRI 10.1%, ORME 21.1 for females; NRI 6.4%, ORME 28.4 for males).

Conclusions

Machine learning can predict PASC based on previous diagnoses and medications. Use of this AI method could support diagnostics of PASC in primary care and provide insight into PASC etiology.

Peer Review reports

Background

Coronavirus disease 2019 (COVID- 19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV- 2), caused a global emergency from 2020 to 2022. Although the widespread outbreak subsided in 2022, infections continue to occur [1]. The majority of those infected recover, but some individuals experience persistent symptoms known as post-acute sequelae of COVID- 19 (PASC). The World Health Organization defines PASC as illness that occurs within 3 months of COVID- 19 infection and cannot be explained by alternative diagnoses [2]. PASC is characterized by a wide range of symptoms. It can significantly impair daily functioning and is currently a common cause of sick leave [3,4,5,6]. At the time being, there are no objective diagnostic tests or accessible biomarkers for the condition, and the diagnosis is based on organ dysfunction, various symptoms and their duration. Nor has the pathophysiology of PASC been clearly elucidated [7, 8]. Finding diagnostic tools and understanding the etiology of the condition is a challenge; such developments might potentially improve management and outcomes for those affected.

There is growing evidence indicating that females have a higher risk of developing PASC compared to males [9, 10]. Other identified risk factors for PASC include belonging to certain demographic groups (e.g., females aged 35–50 years and socioeconomically deprived individuals), having pre-existing health conditions, such as obesity and cardiovascular disease, experience of more severe acute illness, and being unvaccinated [9, 10]. These risk factors have mainly been based on follow-up studies of hospitalized patients [11, 12] and questionnaire data [13,14,15,16]. Epidemiological studies on PASC are difficult to interpret and combine, because inclusion criteria, diagnostic criteria, and methodologies vary [10]. A Swedish study based on register data from secondary care found that symptoms of dyspnea and fatigue, and abnormal pulmonary imaging or findings, were associated with a PASC diagnosis [17]. Research to identify subtypes of PASC has yielded variable results thus far [4, 18]. Little is known about individuals who have developed PASC and were not hospitalized, were only in contact with primary care, or were not even in contact with healthcare in regard to COVID- 19.

Recently, machine learning methods have been applied to structure large amounts of patient data from multiple sources, improving the identification of chronic diseases, including new-onset diabetes [19], cardiovascular disease [20,21,22] and cancer [23, 24]. In the COVID- 19 context, machine learning models have been tested to describe the nature of PASC in terms of demographic features, symptom severity, and duration [25, 26]. A few previous studies have used machine learning to predict risk factors associated with PASC [27, 28]. For example, two studies demonstrated that the majority of individuals with PASC were female, with severe acute COVID- 19 and comorbidities including depression, type 2 diabetes, chronic kidney disease, and chronic pulmonary disease [27, 28]. In the context of PASC, machine learning outperforms traditional statistical models when predictive accuracy is the main goal because it can capture non-linear relationships and complex interactions between variables. Machine learning uses adaptive complex relationship through algorithms in our case thousands of decision trees, that perform better than variables in regression models that often have problems with collinearity [25].

Early studies during the pandemic predominantly used traditional statistics, such as logistic regression to identify predictors and risk factors of PASC [29]. Although, logistic regression is better for interpreting and understanding a hypothetical relationship that may be causal, introduces potential biases due to its simplicity, limited ability to capture complex interactions when the number of variables increase, and challenges in handling missing data [25, 30].These shortcomings are particularly significant for PASC, a condition with heterogeneous and overlapping symptoms. In contrast, machine learning demonstrates robust performance by leveraging data-driven feature selection and capturing non-linear relationships, enabling robust modeling of complex interactions among predictors [25, 30] underscored the potential of machine learning to identify novel, unexpected predictors in multifaceted conditions like PASC, where traditional models may falter. However, despite these advantages, the clinical relevance of machine learning for PASC remains underexplored [26].

Our study builds on prior research by applying stochastic gradient boosting (SGB), a machine learning method well-suited for analyzing high-dimensional data and identifying significant predictors. While earlier studies have predominantly focused on hospitalized populations or secondary care data, our work incorporates primary care diagnoses and prescription medications, offering a broader perspective on PASC predictors [25]. This approach highlights the potential of machine learning in diverse healthcare settings and may provide insights into the complex nature and potential drivers of PASC.

One of the machine learning-methods that can be used to predict medical conditions is stochastic gradient boosting (SGB) [25]. This technique is well-suited for analyzing high-dimensional datasets, as it can incorporate numerous variables while capturing complex, non-linear interactions among predictors. Unlike traditional statistical models, such as logistic regression, which often rely on linear assumptions and struggle with multi-collinearity or missing data, SGB is robust against these limitations. It employs iterative learning to minimize errors and improve prediction accuracy by combining the strengths of multiple weak learners (decision trees).

Moreover, SGB offers the capability to rank variables by their predictive importance, enabling a nuanced understanding of which factors contribute most to the outcome. This attribute is especially critical in multifaceted conditions like PASC, where predictors may interact in unexpected ways. Previous applications of SGB in our research demonstrated its effectiveness in identifying risk factors for chronic diseases, including colorectal cancer [23, 31] and diabetes [19]. These studies underline the suitability of SGB for modeling complex relationships in healthcare data, making it an ideal choice for exploring PASC predictors using primary care data and prescription history.

Given this background and knowledge gaps, we sought to determine whether the application of a machine learning model, SGB, could predict risk factors of PASC diagnosis. The model included all diagnoses from primary healthcare (PHC) consultations and dispensed prescribed medication during the year before PASC diagnosis. Previous hospitalization due to acute COVID- 19 before PASC diagnosis was also used in the model. The VAL database from Region Stockholm, which encompasses register data from primary care settings, was thought to be suitable. We hypothesized that this machine learning tool could be used as diagnostic support in primary care settings and identify predictors that could potentially play a role in the etiology of PASC.

Methods

Study design

This population-based case–control study encompassed subjects 18–65 years old who were registered at PHC centers (PHCCs) in the Stockholm Region in Sweden. The Stockholm Region is the largest metropolitan area in Sweden and has a total population of 2.5 million residents [32]. Register data for this study were gathered from the VAL database, which includes all registered diagnoses based on the International Classification of Diseases, Tenth Revision (ICD- 10), and all dispensed prescription drugs based on their Anatomical Therapeutic Chemical (ATC) code [33].

Study population

The study cases included all individuals who had received the diagnosis post-COVID condition, unspecified (PASC, ICD- 10: U09.9) in any healthcare setting between 2020 and 2022. Each case was matched by age and sex with up to five controls who had not been diagnosed with PASC during the study period. Data on all diagnoses from physician consultations at PHCCs in Region Stockholm and dispensed prescribed medications were collected. We used prescribed medication as a proxy for chronic conditions. Hospitalization due to COVID- 19 before PASC diagnosis was also included.

Subjects who did not seek healthcare during the study period were excluded because they lacked data on visits and diagnoses, which are essential variables for the analysis. Including such individuals would not allow for meaningful comparisons between those with and without COVID- 19, as their absence from the healthcare system renders them non-contributory to predictive modeling. While we recognize that this exclusion introduces selection bias and limits the generalizability of our findings to those who engage with healthcare services, it was a necessary step to ensure the study’s objectives could be met. Future studies incorporating community-level data or self-reported health metrics could address this limitation and provide a different view of the population.

It was assumed that most of the population in Region Stockholm and the individuals in this study had had a COVID- 19 infection and were vaccinated for COVID- 19. Therefore, the diagnoses including COVID- 19 (ICD- 10: U07, U08), immunization against COVID- 19 (ICD- 10: U11) and adverse effects of COVID- 19 vaccines (ICD- 10: U12) were not included as predictors.

Variables

We collected age at PASC diagnosis, sex, diagnoses (ICD-10 codes) from PHCCs, and dispensed prescribed medications reported during the 12 months prior to the index date (PASC diagnosis date). ICD codes for chronic diseases and conditions representing similar clinical features were merged into common clinical groups in accordance with previous studies (for further information see Additional File: Table 1). All other ICD codes were used as three-character codes, except Postviral fatigue, ICD code G93.3, which was deemed to have particular clinical relevance. It was therefore used as a four-character code, see Additional File: Table 1. A similar approach was used for ATC codes. All ATC-codes were one letter and two digits, and to distinguish medications of particular interest, they were in higher resolution, see Additional File: Table 2.

Table 1 Demographic characteristics of PASC cases and controls
Table 2 Confusion matrix for predicting presence of PASC among females and males in the test dataset using the stochastic gradient boosting model created from the training dataset

Statistical methods

This study used the SGB technique for data analysis, an effective form of AI formerly utilized in similar research [34]. It has previously been applied by our research group to analyze factors influencing lung and colorectal cancer risk [23], and more recently diabetes and hypertension in primary care [19, 35]. The SGB model employed in this study is inherently capable of handling missing data by incorporating them as a separate category in the model. This feature ensures that individuals with incomplete data are not excluded from the analysis. The models were developed for males and females separately. For each of the two sets of training data, diagnoses and medications with at least 50 occurrences were selected. The optimal number of trees to use for prediction was estimated using tenfold cross-validation to ensure model robustness and prevent overfitting. Other hyperparameters, such as learning rate, maximum depth, and subsampling rate, were chosen based on prior studies by our group [19, 31], and validated for this dataset to align with best practices. This approach ensures that the parameters were both evidence-based and suitable for the current study context.

In this study, the top 2000 most common diagnoses registered in primary care were used for all 47,568 individuals. All dispensed drugs prescribed in primary and secondary care were included in the model. The diagnostic codes issued, and medications prescribed during the year before index date were used as predictors. By the use of this model, this resulted in 78 diagnoses and 52 medications for males and 125 diagnoses and 69 medications for females with at least 50 occurrences.

Next, the dataset was divided by sex, resulting in a group of 30,678 females and a group of 16,890 males. Applying a training-test approach for each group, we created a randomly 48 selected training set for each sub-dataset. Thus, 70% of the cases (n = 3579 females and n = 1964 males) with their matched controls (n = 21,475 females and n = 11,823 males) were used for training the SGB model. The remaining 30% of the cases (n = 1534 females and n = 851 males) with their controls (n = 9203 females and n =5067 males) were used for evaluating the model’s performance. The proportions of individuals with PASC were equal in the training and test datasets.

The performances of the final models were evaluated using area under the receiver operator characteristics (ROC) curve, sensitivity, and specificity. The SGB model was then applied to each test dataset to obtain patient-specific probabilities of being diagnosed with PASC. The probability that maximized the sum of sensitivity and specificity was used as a cut-off value such that patients with a probability higher than this cut-off were classified as being diagnosed with PASC. The results are presented in a confusion matrix, with the performance of the prediction given by sensitivity and specificity. The ROC curve shows the trade-off between true positive rate (sensitivity) and false positive rate (1—specificity) at various thresholds. The area under the curve (AUC, ranging from 0 to 1) summarizes the overall accuracy of the model. An AUC of 1 indicates perfect prediction and 0 no better than random prediction. These metrics provide valuable insights into the model's ability to distinguish between positive and negative instances.

From the SGB model, we obtained a ranking of the diagnoses most often related to the PASC diagnosis, presented as the normalized relative influence (NRI) with a corresponding odds ratio of marginal effects (ORME) for being diagnosed with PASC. Based on our previous studies with the machine learning SGB [19, 35], we assumed 1% NRI as our cut-off threshold for clinically relevant diagnoses and prescribed medications. For each diagnosis, the odds ratio was calculated using the probability of being diagnosed with PASC, obtained by integrating out all other variables from the model using the weighted tree traversal method [34].

The analyses were performed using R version 4.2.1 [36].

Results

Study population

In total, there were 47,568 study subjects and controls, of whom 39,640 were controls matched by age and sex with the post-acute sequelae of COVID- 19 (PASC) cases. There were more females (n = 5113) than males (n = 2815) diagnosed with PASC between 2020 and 2022 (Table 1). The data for PASC cases and controls were divided into training and test datasets. The training dataset for females encompassed 21,475 subjects, whereas the test dataset for females encompassed 9203 subjects. The training and test datasets for males included 11,823 and 5067 subjects, respectively.

Predictive ability of the SGB model

The SGB model showed that PASC was predicted with relatively good accuracy, with an AUC of 0.804 (95% CI: 0.789–0.819) for females and 0.839 (0.820–0.858) for males; see Fig. 1A and B, respectively. Thus, there were 80% correct predictions in females and 84% in males.

Fig. 1
figure 1

Receiver operator characteristics curve for the optimal stochastic gradient boosting model applied to A females and B males in the test dataset. AUC, area under the curve

The SGB model showed good predictive ability. As seen in Table 2, for female patients, the model identified 5863 out of 6210 females not to have PASC. It also correctly identified 1187 out of 2993 females with PASC. This means that 95% of the females with no PASC diagnoses were correctly identified, and 40% of those diagnosed with PASC. For males, the model identified 3505 out of 3678 males not to have PASC, correctly classifying 95%. Furthermore, it identified 678 out of 1398 males with PASC, in total 48% of those who were diagnosed.

Variable importance

The diagnoses and prescribed medications that were significantly predictive in PASC had an NRI ≥ 1% and are presented in Table 3 for females and Table 4 for males. All variables included in the machine learning model are presented in Additional file: Tables 1 and 2. The number of variables surpassing the 1% NRI threshold differed slightly between sexes, with 17 variables for females and 15 for males. This reflects differences in the predictive relevance of certain diagnoses and prescribed medications, emphasizing the importance of sex-stratified analyses in understanding PASC,

Table 3 The 17 variables with highest NRIs for predicting presence of PASC among females, based on the optimal stochastic gradient boosting model with 15,377 trees, together with ORME during the year before PASC diagnosis
Table 4 The 15 variables with highest NRIs for predicting presence of PASC among males, based on the optimal stochastic gradient boosting model with 11,221 trees, together with ORME during the year before PASC diagnosis

Among females, 125 diagnoses and 69 medications showed an NRI of more than 0%, with 17 of these diagnoses and drugs combined having a relative influence of over 1%. The five diagnoses with the highest NRIs were COVID- 19 with inpatient care (hospitalization) at 41.7%, malaise and fatigue at 14.5%, post-viral and related fatigue syndromes at 10.1%, dyspnea at 8.4%, and upper respiratory tract infections at 5.9%. top prescribed medications with the highest NRIs were adrenergics, inhalants and other drugs for obstructive airway diseases, and inhalant medicines at 2.8%, and hormonal contraceptives for systemic use at 1.3%.

Among males, 78 diagnoses and 52 medications showed an NRI above 0%, and 15 of these had a relative important influence of over 1%. The five diagnoses with the highest NRIs were COVID- 19 with inpatient care at 41.7%, malaise and fatigue at 11.5%, post-viral and related fatigue syndromes at 6.4%, dyspnea at 8.4%, and cough at 4.1%. The prescribed medications with the highest NRIs were adrenergics, inhalants and other drugs for obstructive airway diseases, and inhalant medicines at 1.5%.

Marginal effects

The results for the sex-stratified statistical models showed that the top five diagnoses with the highest NRIs had an ORME above 1. For females, these diagnoses were disturbances of smell and taste (ORME 28.9), post-viral and related fatigue syndromes (ORME 21.1), COVID- 19 in inpatient care (ORME 18.8), dyspnea (ORME 6.2), and personal history of other diseases (ORME 5.6) (Table 3). For males, the top five diagnoses were COVID- 19 with inpatient care (ORME 31.6), post-viral and related fatigue syndromes (ORME 28,4), malaise and fatigue (ORME 7.9), dyspnea (ORME 6.5), and tachycardia (ORME 5.2) (Table 4).

Discussion

In the present study, with a large study population of subjects registered at primary health care centers (PHCCs)s in Region Stockholm, data was analyzed included diagnoses recorded at PHC consultation and prescribed medication from the year before PASC diagnosis. Our findings suggest that the machine learning SGB models have promising potential for identifying subjects at risk of PASC and uncovering associations between various factors and PASC. As this is an observational study, these results should be interpreted as identifying patterns and correlations rather than establishing causal relationships. We found that females were at higher risk of being diagnosed with PASC than males. Previous hospitalization due to acute COVID- 19 was strongly associated with an increased risk of PASC in both sexes. Several diagnoses from primary care physicians were significantly linked to higher risk of PASC. These included post-viral and related fatigue syndrome, symptom diagnoses, such as malaise, fatigue, dyspnea, impaired smell and taste, tachycardia, cough and headache, reactions to acute and severe stress in both sexes, as well as anxiety in females and asthma in males. Among the prescribed medications, adrenergic inhalants and other inhalants for obstructive airway diseases in both sexes, and hormonal medication in females, were also linked to higher risk of PASC.

As we hypothesized, the presented method could be applied as a prediction tool for PASC. Our presented machine learning model could be clinically relevant, as it can support diagnostic of PASC in primary care, as long as there are no biomarkers or objective diagnostic tests for PASC. Our findings of several PASC predictors may provide some insight into PASC etiology, which is currently unknown. The model is robust and reproducible, and when more data sources are identified, it can be retrained in further studies.

We found that prior hospitalization due to COVID-19 was the strongest predictor associated with PASC in both sexes. However, a recent meta-analysis suggests that association between hospitalization and PASC is still inconclusive due to studies including mixed cohorts, with both hospitalized and non-hospitalized patients, and some not having data on whether individuals were treated at intensive care [37]. This discrepancy might be attributed to that the meta-analysis using only statistical linear models, while our analysis allowed us to capture linear and nonlinear relationships and interactions among predictors that are typical for a multifactorial condition like PASC. In fact, follow-up studies of PASC have primarily been initiated with hospitalized patients, stating the risk of PASC increased with the severity of initial COVID-19 infection [28, 38, 39]. A meta-analysis showed that long hospitalization and having received intensive care more than doubled the risk of PASC [40]. Even if the association among hospitalized patients is complicated, as the studies usually have not considered post-intensive care syndrome as a differential diagnosis to PASC [41].

We demonstrated that a symptom diagnosis of malaise and fatigue or post-viral fatigue syndrome (ICD-10 G933), which encompasses myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), were strongly associated with the increased the risk of PASC in both sexes. PASC and ME/CFS share major symptoms, including chronic fatigue. Diagnostics are based on the presence and duration of symptoms and exclusion of other causes [42]. Previous cross-sectional studies have suggested that 43–58% of PASC patients meet the ME/CFS diagnostic criteria [43,44,45]. In those studies, ME/CFS and PASC were more prevalent in the non-hospitalized female population. The underlying mechanism for this observation is not fully understood but are thought to involve a combination of sex-specific factors including stronger immune response in females after infection and hormonal influences [45]. The clinical similarities between ME/CFS and PASC allow us to suggest a multifactorial etiology and pathobiology, including a preceding viral illness, increase in inflammation cytokines, neuroinflammation, mitochondrial dysfunction, and alteration in natural killer cell function [42].

We also found that symptom diagnoses, such as dyspnea, disturbances in smell and taste, tachycardia, cough, and headache, as well as upper respiratory tract infections, were predictors of PASC in both sexes. Similar respiratory symptomatology is well-described in other respiratory viral syndromes, including those from severe acute respiratory syndrome, respiratory syncytial virus, and influenza [46, 47]. Further, SARS-CoV-2 is primarily a respiratory virus, meaning that long-term respiratory symptoms are not surprising, and associations have been shown in several previous studies [4, 13,14,15, 48, 49]. Other symptoms, including tachycardia and headache, have been reported as symptoms of PASC and are included in WHO’s case definition [2].

Furthermore, our results showed an association between PASC and reactions to acute and severe stress in both sexes, and also anxiety in females. In line with this, a large epidemiological study showed that pre-infection psychological distress, including depression, anxiety, and loneliness, was a risk factor for PASC [50]. Mounting evidence supports the link between psychiatric disorders and immune system dysregulation, suggesting that they are closely intertwined and possibly power each other in a bidirectional loop [51, 52].

Our results showing that females appeared to be at greater risk than males of developing PASC, with a ratio 1.8:1 being consistent with prior research, including meta-analyses [38, 53]. Those studies showed that the mortality and hospitalization rates due to COVID- 19 were lower in females than males. It was suggested that sociodemographic aspects played a role in sex differences of persistent symptoms following COVID-19 [47]. Female patients are more likely than male patients to seek healthcare for both physical and mental symptoms [54]. Beyond sociodemographic aspects, biological factors, including hormones and immune responses, may also influence the higher reported prevalence of PASC among females. Female immune systems exhibit stronger innate and adaptive responses compared to males, which could lead to greater immune activation and a higher likelihood of prolonged post-viral symptoms. Sex hormones have been shown to modulate immune responses, potentially leading to differences in inflammation, autoimmunity, and tissue repair following infection [55]. In this study, we also demonstrated that prescribed female sex hormones (ATC code G03), including contraceptives and hormonal replacement therapy, were associated with an increased risk of PASC. Studies on associations between hormonal drugs and PASC are scarce. A previous study based on a nationwide internet-based survey among Swedish women showed that they self-reported feeling that access to contraceptives decreased during the pandemic and that there was an overall decrease in current use of contraceptives compared with pre-pandemic levels [56]. Therefore, in light of other results, our findings suggest that receiving female sex hormones might play a role in PASC etiology.

Another finding was that adrenergic inhalants and other inhalants for obstructive airway diseases were associated with PASC. Furthermore, we showed that asthma contributed to the prediction of PASC in males, but not in females, in concordance with another study [57]. In addition, meta-analyses have reported contradictory results regarding susceptibility to COVID-19 in patients with asthma [58,59,60]. There are indications that age and severity of disease in asthma affects the outcome of COVID-19 [60, 61]. There is a lack of studies on post-COVID- 19 status in relation to asthma. In a UK-based survey in patients with asthma, 10.5% reported COVID- 19, of these, 56% reported having PASC [62].

In contrast to other researchers, we did not find that other chronic diseases—such as obesity, type 2 diabetes, chronic obstructive pulmonary disease, or ischemic heart diseases—were risk factors for PASC [27, 28, 40, 58]. This might be because we based our analysis only on diagnostic codes registered during the year before the index date, and this was during the pandemic, when many non-urgent healthcare and follow-up visits were postponed. Thus, diagnosing of chronic conditions might have been limited. Nor did we see that dispensing of medications for these conditions, which we in this study saw as a proxy for chronic conditions, was associated with PASC. This was observed despite the fact that a Swedish study showed that there was an increase in the volume of dispensed medication early in the pandemic, possibly due to individuals with chronic disease having decided to dispense extra supplies of medication in case of lockdown [63]. Furthermore, in Sweden, a large proportion (70%) of the population with chronic disease is diagnosed and followed up annually in primary care [64].

The clinical relevance of using SGB, as we did, is that it enables analyses of large amounts of complex data and therefore identify previously unknown relationships in females and males separately. The rationale for dividing the population into female and male patients was based on known sex differences in PASC presentation and symptoms [38, 53]. The model demonstrated strong predictive performance, accurately identifying the majority of patients without PASC.

Although the pandemic has ended, COVID-19 is still circulating and the cumulative incidence of PACS is still substantial [65]. It is therefore important to continue studies on the consequences of COVID-19, in order to learn for future outbreaks. This knowledge could potentially assist health care personnel in prioritizing patients when allocating limited healthcare resources.

For future directions, integrating this method into clinical workflows will require external validation in other Swedish and international populations. Additionally, in future studies, incorporating data beyond February 2022 will help assess the model’s robustness across different pandemic phases. Future studies should also explore potential biomarkers, healthcare utilization, cost-effectiveness, resource requirements, and practical strategies for integrating such tools into existing healthcare systems. Furthermore, from a clinical perspective incorporating detailed hospitalization data, such as length of stay, use of intensive care, or specific treatments received, would help clarify the relationship between acute illness severity and PASC risk.

Strengths and limitations

The main strength of this study is the large size of population-based data from primary care settings, where studies are scarce. The study included individuals of different ages and both sexes from both urban and rural areas within Region Stockholm, Sweden. The region represents a fast growing population of 2.5 million [32], amounting to approximately 25% of entire population in Sweden [66]. Sweden has a unique infrastructure with register databases. However, primary care data are only registered locally. Our dataset is unique and included all ICD-10 diagnoses recorded by primary care physicians from the VAL databases. Unlike previous studies, which primarily focused on hospitalized cohorts or secondary care data, our work incorporates primary care data and prescribed medications, offering a broader perspective on PASC predictors. Additionally, we used stochastic gradient boosting (SGB), which allows for the inclusion of numerous variables and the modeling of complex interactions between them. This approach enables the identification of both known and novel predictors of PASC, such as hormonal medication in female patients, which were not highlighted in earlier studies. By focusing on primary care data, our methodology provides a more comprehensive and nuanced understanding of PASC predictors, advancing beyond existing approaches that are limited to secondary care settings.

Another strength of our study is the use of the SGB method, where all included variables, such as diagnoses and medications, were incorporated into the model along with their interactions with each other. The SGB model can be considered as adjusted for hundreds of variables, enabling it to account for complex, non-linear relationships and interactions that might act as confounders. While we stratified by age and sex, which are known key factors in PASC, we recognize that other potential confounders—such as socioeconomic status, vaccination status, and pre-existing comorbidities—were not explicitly adjusted for in this study.

The use of SGB allows the inclusion of individuals with missing data, mitigating the impact of exclusion of patients with incomplete records. However, as with any registry-based study, underreporting of diagnoses or healthcare contacts is a potential limitation. For example, individuals who only occasionally seek healthcare may have fewer registered data points, which could impact predictor completeness. Despite this, the robustness of the SGB model ensures that the analysis can incorporate such cases. Our model demonstrated strong predictive performance, in identifying over 90% of patients without PASC. Its accuracy in identifying patients with PASC was 40% of the female and 48% male patients classified at risk of PASC actually had PASC, indicating that those identified by the model are at high risk of PASC. These findings highlight the model’s potential for clinical use, while also underscoring the need for further refinement.

There are several limitations in this study. One is that general practitioners may not document all diagnoses, particularly symptom diagnoses, presented during healthcare visit [67]. Instead, they may focus on the specific reasons for each particular visit, which should be taken into consideration when interpreting our results. Furthermore, conditions such as chronic obstructive pulmonary disease and obesity are known to be underdiagnosed. Further, some of the diagnoses associated with PASC in our study can be difficult to interpret, such as ICD code Z86.1: Personal history of COVID-19, which was used in Sweden only during the period June 1, 2020 to December 31, 2021 [68].

Another limitation of our study is that we do not differentiate between hospitalized patients and patients treated in intensive care. The strong association between prior hospitalization due to COVID- 19 and PASC likely reflects the heightened risk associated with more severe acute illness. However, our inability to differentiate between levels of care intensity during hospitalization limits our ability to fully interpret this finding.

Furthermore, the data were based on diagnostic codes (ICD-10) and prescription drug codes (ATC). We were not able to assess some potential confounders that could influence PASC predictors including other sociodemographic factors except for age and sex, self-reported lifestyle factors such as smoking habits, symptom severity, or the potential impact on quality of life. We did not evaluate the impact of vaccines, which are known to play a protective role against PASC [40]. Furthermore, the study did not assess the effect of therapeutics, such as Paxlovid, on the likelihood of PASC diagnosis, nor did we evaluate the association with COVID- 19 reinfection or consider that the controls could be diagnosed with PASC outside the timeframe of this study. This residual confounding was mitigated in the study by using SGB modeling, which allows for the inclusion of multiple variables and complex interactions, improving the model’s ability to account for unmeasured confounders.

Another weakness is that no specific ICD- 10 code for PASC is available, possibly resulting in variations of usage of the diagnostic code U09.9 between physicians and healthcare facilities. For example, the highest usage of the diagnosis code U09.9 among Swedish metropolitan areas was reported in the Region Stockholm, where this study was conducted [17]. Thus, there may have been misclassification and underreporting of PASC in other regions of Sweden. As both the diagnosis code and the syndrome PASC are novel, the use of the diagnosis will likely have evolved during the study period. Clinicians may be more likely to report this diagnosis now than at the beginning of the study period. Future research should explore these acknowledge biases to validate findings and further refine predictive models in later cohorts.

Conclusions

This study demonstrated that the SGB model can identify associations between PASC and registered diagnoses, as well as prescribed medications, during the year before a PASC diagnosis. Known risk factors, such as previous hospitalization due to COVID-19, respiratory, neurological, and cognitive symptoms, and the use of inhalation medicines in both sexes, as well as asthma in male patients, were verified. Additionally, novel predictors, such as hormonal medication in female patients, were identified and warrant further investigation. While these findings highlight the potential of machine learning for exploring PASC predictors, the model requires external validation and further refinement before it can be considered for implementation in clinical settings.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

PASC:

Post-acute sequelae of COVID- 19

NRI:

Normalized relative influence

ORME :

Odds ratios of marginal effects

COVID- 19:

Coronavirus disease 2019

SARS-CoV- 2:

Severe acute respiratory syndrome coronavirus 2

SGB:

Stochastic gradient boosting

PHC:

Primary healthcare

PHCC:

PHC centers

ICD- 10:

International Classification of Diseases, Tenth Revision

ATC code:

Anatomical Therapeutic Chemical code

ROC curve:

Receiver operator characteristics curve

AUC:

Area under the curve

ME/CFS:

Myalgic encephalomyelitis/chronic fatigue syndrome

References

  1. Rajan S, Khunti K, Alwan N, Steves C, MacDermott N, Morsella A, et al. In the wake of the pandemic: preparing for long COVID. Copenhagen (Denmark): European Observatory Policy Briefs; 2021.

    Google Scholar 

  2. WHO. A clinical case definition of post COVID-19 condition by a Delphi consensus, https://www.who.int/publications/i/item/WHO-2019-nCoV-Post_COVID-19_condition-Clinical_case_definition-2021.1 Last seen 20231013. 2021.

  3. O’Mahoney LL, Routen A, Gillies C, Ekezie W, Welford A, Zhang A, et al. The prevalence and long-term health effects of Long Covid among hospitalised and non-hospitalised populations: a systematic review and meta-analysis. EClinicalMedicine. 2023;55:101762.

    Article  PubMed  Google Scholar 

  4. Kisiel MA, Lee S, Malmquist S, Rykatkin O, Holgert S, Janols H, et al. Clustering analysis identified three long COVID phenotypes and their association with general health status and working ability. J Clin Med. 2023;12(11):3617.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Kisiel MA, Lee S, Janols H, Faramarzi A. Absenteeism Costs Due to COVID-19 and their predictors in non-hospitalized patients in sweden: a poisson regression analysis. Int J Environ Res Public Health. 2023;20(22):7052.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kisiel MA, Nordqvist T, Westman G, Svartengren M, Malinovschi A, Janols H. Patterns and predictors of sick leave among Swedish non-hospitalized healthcare and residential care workers with Covid-19 during the early phase of the pandemic. PLoS ONE. 2021;16(12):e0260652.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Katsoularis I, Fonseca-Rodriguez O, Farrington P, Lindmark K, Connolly AF. COVID-19 and myocardial infarction - Authors’ reply. Lancet. 2021;398(10315):1964.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Katsoularis I, Fonseca-Rodriguez O, Farrington P, Jerndal H, Lundevaller EH, Sund M, et al. Risks of deep vein thrombosis, pulmonary embolism, and bleeding after covid-19: nationwide self-controlled cases series and matched cohort study. BMJ. 2022;377: e069590.

    Article  PubMed  Google Scholar 

  9. Crook H, Raza S, Nowell J, Young M, Edison P. Long covid-mechanisms, risk factors, and management. BMJ. 2021;374:n1648.

    Article  PubMed  Google Scholar 

  10. Greenhalgh T, Sivan M, Perlowski A, Nikolich JZ. Long COVID: a clinical update. Lancet. 2024;404(10453):707–24.

    Article  CAS  PubMed  Google Scholar 

  11. Sigfrid L, Drake TM, Pauley E, Jesudason EC, Olliaro P, Lim WS, et al. Long covid in adults discharged from UK hospitals after Covid-19: a prospective, multicentre cohort study using the ISARIC WHO Clinical Characterisation Protocol. Lancet Reg Health Eur. 2021;8:100186.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Fernandez-de-Las-Penas C, Raveendran AV, Giordano R, Arendt-Nielsen L. Long COVID or post-COVID-19 condition: past, present and future research directions. Microorganisms. 2023;11(12):2959.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, et al. Attributes and predictors of long COVID. Nat Med. 2021;27(4):626–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Al-Aly Z, Bowe B, Xie Y. Long COVID after breakthrough SARS-CoV-2 infection. Nat Med. 2022;28(7):1461–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lopez-Leon S, Wegman-Ostrosky T, Perelman C, Sepulveda R, Rebolledo PA, Cuapio A, Villapol S. More than 50 long-term effects of COVID-19: a systematic review and meta-analysis. Sci Rep. 2021;11(1):16144.

  16. Kisiel MA, Janols H, Nordqvist T, Bergquist J, Hagfeldt S, Malinovschi A, et al. Predictors of post-COVID-19 and the impact of persistent symptoms in non-hospitalized patients 12 months after COVID-19, with a focus on work ability. Ups J Med Sci. 2022;127:10–48101.

    Article  Google Scholar 

  17. Ollila HM F-Rodríguez O, Caspersen IH, et al. How do clinicians use post-COVID syndrome diagnosis? Analysis of clinical features in a Swedish COVID-19 cohort with 18 months’ follow-up: a national observational cohort and matched cohort study. BMJ Public Health. 2024;2:e000336. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmjph-2023-000336.

  18. Al-Aly Z, Davis H, McCorkell L, Soares L, Wulf-Hanson S, Iwasaki A, et al. Long COVID science, research and policy. Nat Med. 2024;30(8):2148–64.

    Article  CAS  PubMed  Google Scholar 

  19. Wandell P, Carlsson AC, Wierzbicka M, Sigurdsson K, Arnlov J, Eriksson J, et al. A machine learning tool for identifying patients with newly diagnosed diabetes in primary care. Prim Care Diabetes. 2024;18(5):501–5.

    Article  PubMed  Google Scholar 

  20. Ainiwaer A, Hou WQ, Qi Q, Kadier K, Qin L, Rehemuding R, et al. Deep learning of heart-sound signals for efficient prediction of obstructive coronary artery disease. Heliyon. 2024;10(1):e23354.

    Article  PubMed  Google Scholar 

  21. Abdar M, Ksiazek W, Acharya UR, Tan RS, Makarenkov V, Plawiak P. A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput Methods Programs Biomed. 2019;179: 104992.

    Article  PubMed  Google Scholar 

  22. Ainiwaer A, Hou WQ, Kadier K, Rehemuding R, Liu PF, Maimaiti H, et al. A Machine Learning Framework for Diagnosing and Predicting the Severity of Coronary Artery Disease. Rev Cardiovasc Med. 2023;24(6): 168.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Nemlander E, Ewing M, Abedi E, Hasselstrom J, Sjovall A, Carlsson AC, et al. A machine learning tool for identifying non-metastatic colorectal cancer in primary care. Eur J Cancer. 2023;182:100–6.

    Article  CAS  PubMed  Google Scholar 

  24. Nemlander E, Ewing M, Carlsson AC, Rosenblad A. Transforming early cancer detection in primary care: harnessing the power of machine learning. Oncoscience. 2023;10:20–1.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ahmad I, Amelio A, Merla A, Scozzari F. A survey on the role of artificial intelligence in managing Long COVID. Front Artif Intell. 2023;6:1292466.

    Article  PubMed  Google Scholar 

  27. Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR. Identifying who has long COVID in the USA: a machine learning approach using N3C data. The Lancet Digital Health. 2022;4(7):E532–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hill EL, Mehta HB, Sharma S, Mane K, Singh SK, Xie C, et al. Risk factors associated with post-acute sequelae of SARS-CoV-2: an N3C and NIH RECOVER study. BMC Public Health. 2023;23(1):2103.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Stasenko SV, Kovalchuk AV, Eremin EV, Drugova OV, Zarechnova NV, Tsirkova MM, et al. Using machine learning algorithms to determine the post-COVID state of a person by their rhythmogram. Sensors (Basel). 2023;23(11):5272.

    Article  CAS  PubMed  Google Scholar 

  30. Frondelius T, Atkova I, Miettunen J, Rello J, Vesty G, Chew HSJ, et al. Early prediction of ventilator-associated pneumonia with machine learning models: a systematic review and meta-analysis of prediction model performance(✰). Eur J Intern Med. 2024;121:76–87.

    Article  PubMed  Google Scholar 

  31. Nemlander E, Rosenblad A, Abedi E, Ekman S, Hasselstrom J, Eriksson LE, et al. Lung cancer prediction using machine learning on data from a symptom e-questionnaire for never smokers, formers smokers and current smokers. PLoS ONE. 2022;17(10): e0276703.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. https://www.regionstockholm.se/nyheter/2024/09/befolkningsokningen-bedoms-bli-mindre-visar-arets-prognos/. 2024. February 24, 2025.

  33. VAL-databaserna – Region Stockholm Stockholm: Centrum för epidemiologi och samhällsmedicin, Region Stockholm; 2024. https://www.folkhalsokollen.se/datakallor/val-databaserna/. Accessed 3 May 2024.

  34. Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics. 2001;29(No. 5 (Oct., 2001)):1189-232 (44 pages).

  35. Norrman A, Hasselstrom J, Ljunggren G, Wachtler C, Eriksson J, Kahan T, et al. Predicting new cases of hypertension in Swedish primary care with a machine learning tool. Prev Med Rep. 2024;44:102806.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2016. http://www.R-project.org/.

  37. Fernandez-de-Las-Penas C, Notarte KI, Macasaet R, Velasco JV, Catahay JA, Ver AT, et al. Persistence of post-COVID symptoms in the general population two years after SARS-CoV-2 infection: A systematic review and meta-analysis. J Infect. 2024;88(2):77–88.

    Article  PubMed  Google Scholar 

  38. Notarte KI, de Oliveira MHS, Peligro PJ, Velasco JV, Macaranas I, Ver AT, et al. Age, sex and previous comorbidities as risk factors not associated with SARS-CoV-2 infection for long COVID-19: a systematic review and meta-analysis. J Clin Med. 2022;11(24):7314.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Sahu AK, Mathew R, Aggarwal P, Nayer J, Bhoi S, Satapathy S, et al. Clinical determinants of severe COVID-19 disease - a systematic review and meta-analysis. J Glob Infect Dis. 2021;13(1):13–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Tsampasian V, Elghazaly H, Chattopadhyay R, Debski M, Naing TKP, Garg P, et al. Risk Factors Associated With Post-COVID-19 condition: a systematic review and meta-analysis. JAMA Intern Med. 2023;183(6):566–80.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Bangash MN, Owen A, Alderman JE, Chotalia M, Patel JM, Parekh D. COVID-19 recovery: potential treatments for post-intensive care syndrome. Lancet Respir Med. 2020;8(11):1071–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Komaroff AL, Lipkin WI. ME/CFS and Long COVID share similar symptoms and biological abnormalities: road map to the literature. Front Med (Lausanne). 2023;10:1187163.

    Article  PubMed  Google Scholar 

  43. Jason LA, Dorri JA. ME/CFS and Post-Exertional Malaise among Patients with Long COVID. Neurol Int. 2022;15(1):1–11.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Mancini DM, Brunjes DL, Lala A, Trivieri MG, Contreras JP, Natelson BH. Use of Cardiopulmonary Stress Testing for Patients With Unexplained Dyspnea Post-Coronavirus Disease. JACC Heart Fail. 2021;9(12):927–37.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Bonilla H, Quach TC, Tiwari A, Bonilla AE, Miglis M, Yang PC, et al. Myalgic Encephalomyelitis/Chronic Fatigue Syndrome is common in post-acute sequelae of SARS-CoV-2 infection (PASC): Results from a post-COVID-19 multidisciplinary clinic. Front Neurol. 2023;14: 1090747.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Ngai JC, Ko FW, Ng SS, To KW, Tong M, Hui DS. The long-term impact of severe acute respiratory syndrome on pulmonary function, exercise capacity and health status. Respirology. 2010;15(3):543–50.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Fauroux B, Simoes EAF, Checchia PA, Paes B, Figueras-Aloy J, Manzoni P, et al. The Burden and Long-term Respiratory Morbidity Associated with Respiratory Syncytial Virus Infection in Early Childhood. Infect Dis Ther. 2017;6(2):173–97.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Brunvoll SH, Nygaard AB, Fagerland MW, Holland P, Ellingjord-Dale M, Dahl JA, et al. Post-acute symptoms 3–15 months after COVID-19 among unvaccinated and vaccinated individuals with a breakthrough infection. Int J Infect Dis. 2023;126:10–3.

    Article  PubMed  Google Scholar 

  49. Ballering AV, van Zon SKR, Olde Hartman TC, Rosmalen JGM, Lifelines Corona Research I. Persistence of somatic symptoms after COVID-19 in the Netherlands: an observational cohort study. Lancet. 2022;400(10350):452–61.

    Article  Google Scholar 

  50. Wang S, Quan L, Chavarro JE, Slopen N, Kubzansky LD, Koenen KC, et al. Associations of depression, anxiety, worry, perceived stress, and loneliness prior to infection with risk of post-COVID-19 conditions. JAMA Psychiat. 2022;79(11):1081–91.

    Article  Google Scholar 

  51. Menard C, Pfau ML, Hodes GE, Russo SJ. Immune and Neuroendocrine Mechanisms of Stress Vulnerability and Resilience. Neuropsychopharmacology. 2017;42(1):62–80.

    Article  CAS  PubMed  Google Scholar 

  52. Bauer ME, Teixeira AL. Inflammation in psychiatric disorders: what comes first? Ann N Y Acad Sci. 2019;1437(1):57–67.

    Article  CAS  PubMed  Google Scholar 

  53. Mehta HB, Li S, Goodwin JS. Risk factors associated With SARS-CoV-2 infections, hospitalization, and mortality among US nursing home residents. JAMA Netw Open. 2021;4(3):e216315.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Thompson AE, Anisimowicz Y, Miedema B, Hogg W, Wodchis WP, Aubrey-Bassler K. The influence of gender and other patient characteristics on health care-seeking behaviour: a QUALICOPC study. BMC Fam Pract. 2016;17:38.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Sciarra F, Campolo F, Franceschini E, Carlomagno F, Venneri MA. Gender-specific impact of sex hormones on the immune system. Int J Mol Sci. 2023;24(7):6302.

  56. Envall N, Gemzell Danielsson K, Kopp KH. The use and access to contraception in Sweden during the COVID-19 pandemic period. Eur J Contracept Reprod Health Care. 2023;28(5):275–81.

    Article  PubMed  Google Scholar 

  57. Hedberg P, Naucler P. Post-COVID-19 condition after SARS-CoV-2 infections during the omicron surge vs the delta, alpha, and wild type periods in Stockholm, Sweden. J Infect Dis. 2024;229(1):133–6.

    Article  PubMed  Google Scholar 

  58. Zhang H, Zang C, Xu Z, et al. Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes. Nat Med. 2023;29:226–35.

  59. Liu S, Cao Y, Du T, Zhi Y. Prevalence of Comorbid Asthma and Related Outcomes in COVID-19: A Systematic Review and Meta-Analysis. J Allergy Clin Immunol Pract. 2021;9(2):693–701.

    Article  CAS  PubMed  Google Scholar 

  60. Uruma Y, Manabe T, Fujikura Y, Iikura M, Hojo M, Kudo K. Effect of asthma, COPD, and ACO on COVID-19: a systematic review and meta-analysis. PLoS ONE. 2022;17(11):e0276774.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Karlsson Sundbaum J, Konradsen JR, Vanfleteren L, Axelsson Fisk S, Pedroletti C, Sjoo Y, et al. Uncontrolled asthma predicts severe COVID-19: a report from the Swedish National Airway Register. Ther Adv Respir Dis. 2022;16:17534666221091184.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Philip KEJ, Buttery S, Williams P, Vijayakumar B, Tonkin J, Cumella A, et al. Impact of COVID-19 on people with asthma: a mixed methods analysis from a UK wide survey. BMJ Open Respir Res. 2022;9(1):e001056.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Karlsson P, Nakitanda AO, Lofling L, Cesta CE. Patterns of prescription dispensation and over-the-counter medication sales in Sweden during the COVID-19 pandemic. PLoS ONE. 2021;16(8):e0253944.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Forslund T WBS. Primärvårdens roll i sjukvårdssystemet. Stockholm: Region Stockholm, Hälso- och sjukvårdsförvaltningen; 2019.

  65. Organization WH. Coronavirus (COVID-19) dashboard. Geneva. 2024. https://covid19.who.int/.

  66. Region S. Europe's most attractive metropolitan region. https://stockholmregion.org/2024.

  67. Ford E, Nicholson A, Koeling R, Tate A, Carroll J, Axelrod L, et al. Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text? BMC Med Res Methodol. 2013;13:105.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Statistik om tillstånd efter COVID-19. Primärvård och specialiserad vård. Socialstyrelsen; 2021. Contract No.: Art. No. 2021–6–7495. https://www.socialstyrelsen.se/globalassets/sharepoint-dokument/artikelkatalog/ovrigt/2021-4-7353.pdf.

Download references

Acknowledgements

None

Funding

Open access funding provided by Uppsala University. This research was funded by Åke Wibergs Stiftelse (M23 - 0133).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, ACC, SJ; methodology, ACC; software, AO and JE; validation, all authors; investigation PL, ÅW, SL, JB; resources, ACC; data curation, AO, JE, ACC, SE, MAK; writing—original draft preparation, SL, ACC, MAK; writing—review and editing, all authors.; visualization, SL; supervision, MAK, ACC, CJ, AM, CW, LW, MR, JB, ÅW; project administration, ACC, MAK; funding acquisition, MAK. All authors have read and agreed to the published version of the manuscript.

Authors’ Twitter handles

Twiiter handles: @AxelCCarlsson (Axel C. Carlsson)

Corresponding author

Correspondence to Marta A. Kisiel.

Ethics declarations

Ethics approval and consent to participate

The study has been approved by the Swedish Ethical Review Authority Dnr. 2021–01016 with amendments Dnr 2021–05735 - 02, Dnr 2022–04729 - 02, Dnr 2023–07166 - 02, and Dnr 2024–05462 - 02. All data were pseudonymized to protect patient privacy. The data used in the present study are available for research purposes after ethical approval from Stockholm Region at halsodata.rst@regionstockholm.se.

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12916_2025_4050_MOESM1_ESM.docx

Additional file 1: Table S1. [Merged ICD- 10 codes]. Table SS. [Medication of interest as ATC-codes, in high resolution or merged].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Kisiel, M.A., Lindberg, P. et al. Using machine learning involving diagnoses and medications as a risk prediction tool for post-acute sequelae of COVID-19 (PASC) in primary care. BMC Med 23, 251 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04050-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04050-w

Keywords