- Research
- Open access
- Published:
Unveiling lipoprotein subfractions signature in high-FNPO PCOS: implications for PCOM diagnosis and risk assessment using advanced machine learning models
BMC Medicine volume 23, Article number: 289 (2025)
Abstract
Background
Polycystic ovary syndrome (PCOS) is a common reproductive and metabolic disorder in the reproductive-age women. The international evidence-based guideline for the assessment and management of PCOS 2023 now suggests raising the follicle number per ovary (FNPO) threshold from 12 to 20 to define its key feature, polycystic ovarian morphology (PCOM). However, understanding of low- and high-FNPO PCOS cases defined in this cutoff is very limited. Given that the measures of lipoprotein subfractions are the biomarkers of several common diseases, this study aims to explore clinical characteristics and lipoprotein subfractions in low- and high-FNPO PCOS, and develop a diagnostic model.
Methods
A total of 1918 women including 792 low- and 182 high-FNPO PCOS cases, met the international evidence-based guideline 2023, and 944 controls were collected for clinical data analysis. Plasma samples of 66 low-FNPO and 24 high-FNPO PCOS cases and 22 controls matched with BMI and age were utilized for the measurement of 112 lipoprotein subfractions by nuclear magnetic resonance spectroscopy. Partial least squares discriminant analysis (PLS-DA) and logistic regression analysis were used to identify key lipoprotein subfractions. Ten machine learning algorithms and recursive feature elimination with logistic regression were used to construct the effective model to predict PCOM based on the new guideline. Models were validated with bootstrap resampling.
Results
High-FNPO PCOS cases presented worse lipid parameters compared with low-FNPO cases and controls. Based on the results of PLS-DA and logistic regression analysis, seven key lipoprotein subfractions were selected, including V2TG, V3TG, V4TG, V2CH, V3CH, V3PL, and V4PL. The addition of them into the anti-Müllerian hormone (AMH) models for predicting high-FNPO PCOS resulted in a significantly improved model performance (AUC increased from 0.750 to 0.874). Even if the only V3TG was added into the AMH model, the AUC increased to 0.807.
Conclusions
Lipid metabolism, particularly seven key lipoprotein subfractions, has been identified as a major risk factor for high-FNPO PCOS cases. Among these, V3TG subfraction warrants special attention, both from the perspective of disease risk and precision diagnosis. Due to the lack of effective external validation at this stage, validation of larger sample sizes is necessary before generalizing the application.
Background
Polycystic ovary syndrome (PCOS) is the most common reproductive and metabolic disease affecting reproductive aged women [1]. Clinical characteristics of PCOS included clinical and/or biochemical hyperandrogenism (HA), irregular menstruation (IM), polycystic ovarian morphology (PCOM), and metabolic disorders such as insulin resistance [2], type 2 diabetes [3], cardiovascular disease [4], and dyslipidemia [5].
In the Rotterdam diagnostic criteria, PCOM is identified by the presence of excessive follicle number per ovary (FNPO) ≥ 12 [6]. With the technological development, the transvaginal transducers of ultrasound with a frequency ≥ 8 MHz are used in recent years, which could detect a higher number of FNPO [7]. Hence, the Rotterdam criteria for PCOM is not appropriate when high-resolution ultrasound is used and a more accurate threshold for distinguishing normal ovaries from PCOM is needed. In the international evidence-based guideline for the assessment and management of polycystic ovary syndrome 2023, the threshold of FNPO ≥ 20 was recommended for PCOM diagnosis in adult women [8]. Besides that, the level of serum anti-Müllerian hormone (AMH) was recommended as a substitute way of ultrasound to diagnosis PCOM in adults. It has been reported that PCOS women who met the new guideline had a higher risk of metabolic syndrome [9]. A recent study investigated the DNA methylation of PCOS women found that genes annotated to differentially methylated probes in high-FNPO PCOS cases (FNPO ≥ 20) were significantly enriched in regulation of triglycerides (TG) biosynthetic and metabolic process, suggesting that PCOM was closely related to dyslipidemia [10].
Dyslipidemia is a common metabolic complication of PCOS, presenting as higher level of serum TG and very low-density lipoprotein (VLDL), and lower level of serum high-density lipoprotein (HDL) [11]. Traditionally used lipid measurements in clinical applications could only detect the level of total TG, total cholesterol (CH), low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C), which usually ignored the subtle alteration of lipoprotein subfractions [12]. Nuclear magnetic resonance (NMR) spectroscopy is a non-destructive and highly reproducible tool, which could absolutely quantify the concentration of lipoprotein subfractions [13,14,15]. The detailed lipoprotein subfractions profile provides a comprehensive and systematic method to investigate several complex and heterogenous diseases, such as type 2 diabetes (T2DM), chronic kidney failure, and liver disease [16]. Adding lipoproteins could improve the discrimination of T2DM risk prediction model [17]. In PCOS women, obesity and HA both contribute to the dyslipidemia and alteration of VLDL subfractions, which has negatively effect on the long-term health of PCOS women [18]. Women with PCOS who have an atherogenic lipoprotein subfractions profile may be at increased cardiovascular risk throughout their lifetime [19]. However, the relationship between PCOM and lipoprotein subfractions profile is still unknown in PCOS.
In this study, we aimed to investigate the clinical characteristic and different profile of plasma lipoprotein subfractions using NMR spectroscopy between low- and high-FNPO PCOS cases. Moreover, the key lipoprotein subfractions together with AMH marker could increase the prediction ability of PCOM based on the new guideline, compared with the only AMH parameter.
Methods
Subjects
The study participants consisted of 974 PCOS cases recruited in the Center for Reproductive Medicine, Shandong University, from 2014 to 2017. The diagnosis of PCOS was defined according to the Rotterdam Consensus proposed in 2003. IM was determined by a menstrual cycle more than 35 days in length or a history of ≤ 8 menstrual cycles in a year. HA was confirmed if there was evidence of hyperandrogenemia and/or hirsutism. PCOM was determined by 12 or more follicles measuring 2–9 mm using the transvaginal ultrasonography. Based on the international evidence-based guideline for the assessment and management of polycystic ovary syndrome 2023, women with age of < 20 years or > 40 years were excluded. Patients with other etiologies for HA and ovulatory dysfunction were also excluded, e.g., congenital adrenal hyperplasia, 21-hydroxylase deficiency, androgen-secreting tumors, Cushing’s syndrome, thyroid disease, and hyperprolactinemia. For the measurement of lipoprotein subfractions, the plasma of 90 PCOS cases and 22 controls matched with age and BMI were newly collected. PCOS cases were divided into low-FNPO (12 ≤ FNPO < 20) and high-FNPO (FNPO ≥ 20) PCOS cases according to the recommended threshold of 20 FNPOs in the new international guideline.
A total of 944 control women who were referred for routine physical examination or tubal factor infertility were enrolled. All controls had regular menstrual cycles (21–35 days), normal steroid hormone levels, and normal ovarian morphology (FNPO < 12).
In our study, transvaginal ultrasonography examinations were carried out with wide band frequency (5–9 MHz) transducers with automatic optimization, and the center frequency was 8 MHz (E8, GE Healthcare, Milwaukee, WI, USA). A two-dimensional evaluation was performed during the period of oligomenorrhea or amenorrhea, the early follicular phase of the menstrual cycle or a period of prolonged vaginal bleeding. Real-time counts of all visible follicles were performed by gynecologists, and image scans were stored in an electronic recording system (INFINITT PACS, Phillipsburg, NJ, USA). The ovary was visualized in plane, which gave the best image quality, and antral follicles measuring 2–9 mm were counted by scanning each ovary from the inner to the outer margin to obtain the number of all countable follicles.
All experimental protocols performed in studies involving human participants were in accordance with the ethical standards of the Ethics Committee of Shandong University and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Written informed consent was obtained from each patient. All experimental protocols were performed in accordance with relevant guidelines and regulations approved by the Institutional Review Board of Center for Reproductive Medicine, Shandong University (IRB2021-98).
Clinical characteristic data collection
The clinical and biochemical data of 974 PCOS and 944 control women was collected from the medical record system, which includes (i) anthropometric parameters such as age, height, body weight, and menstrual cycle; (ii) endocrine parameters (between the 2nd and 5th day of the menstrual cycle) including follicle stimulating hormone (FSH), luteinizing hormone (LH), estrogen (E2), progesterone (P), prolactin (PRL), total testosterone (T), thyroid stimulating hormone (TSH), AMH, and dehydroepiandrosterone sulfate (DHEAS); and (iii) metabolic parameters, including glucose, insulin, TG, CH, LDL-C, and HDL-C.
The body mass index (BMI) was calculated using the following formula: weight (kg)/height (m)2. The homeostasis model for insulin resistance (HOMA-IR) was calculated by fasting plasma glucose (mmol/L) × fasting insulin (mIU/L)/22.5 [20].
Sample preparation
Fasting blood samples were collected from 90 PCOS cases and 22 controls matched with age and BMI following standard in-hospital procedures. The blood sample was centrifuged at 1500 rpm for 15 min at room temperature. The plasma was then aliquoted into microfuge tubes and stored at − 80 °C until measurement.
Lipoprotein subfractions were measured by 1H-NMR spectroscopy. Briefly, the sodium phosphate buffer mixed with the thawed plasma samples in 1:1 ratio in a Bruker SampleJet NMR tube and vortexed for NMR analysis. A pool of all individual plasma samples was used as the quality control (QC) sample.
Nuclear magnetic resonance (NMR) spectroscopy
NMR spectroscopy was performed on BRUKER AVANCE IVDr spectrometer (Bruker BioSpin, GmBH, Rheinstetten, Germany). The 1H-NMR spectra were obtained by employing a 310-K and 600.13-MHz proton Larmor frequency NMR spectrometer with a 5-mm BBI probe.
Data analysis
The qualitative control analysis was performed regularly and accomplished within the analysis package Bruker IVDr BioBank QC (B.I.BioBankQC™). All the lipids and lipoprotein subfractions were quantified by the Bruker IVDr Lipoprotein Subclass Analysis (B.I.LISA™) analysis platform. The 112 lipid parameters include triglycerides (TG), cholesterol (CH), Apo-B (AB), Apo-A1 (A1), Apo-A2 (A2), HDL, LDL, VLDL, and intermediate-density lipoprotein (IDL), as well as subfractions of each lipoprotein, subdivided according to their density and their concentrations of TG, CH, phospholipids (PL), free cholesterol (FC), AB, A1, and A2. For each subfraction with increasing density, HDL was divided into HDL 1–4, LDL into LDL 1–6, and VLDL into VLDL 1–5 (Additional file 1: Table S1).
Partial least squares discriminant analysis (PLS-DA)
The PLS-DA was performed in MetaboAnalyst 6.0 online website [21]. The raw concentrations of lipoprotein subfractions were sum-normalized and transformed using log10 and z-score methods, which was used as the independent variables, and the group information was used as the dependent variables. The lipoprotein subfractions with variable importance in projection (VIP) score > 2 were considered as the key lipoprotein subfractions.
Logistic regression analysis
Multinomial logistic regression analysis, performed with “nnet” package in R software, was used to estimate the odds ratio (OR) and 95% confidence intervals (CIs) for the incidence of high-FNPO PCOS, per standard deviation (SD)-scaled lipoprotein subfractions concentration. The lipoprotein subfractions with p < 0.05 in logistic regression analysis was selected as the key lipoprotein subfractions.
Machine learning model development
To establish a machine learning model capable of distinguishing high-FNPO PCOS cases from all PCOS cases, we then established three types of models: (i) AMH models, which use AMH level as the sole feature, (ii) full models, which incorporate AMH level along with the seven key lipoprotein subfractions as features, and (iii) concise models, which incorporate AMH level along with V3TG as features. The samples were coded based on FNPO, with high-FNPO PCOS cases coded as 1 and low-FNPO cases coded as 0.
The AMH level and the raw concentrations of lipoprotein subfractions were sum-normalized and transformed using log10 and z-score methods before applying machine learning techniques. We applied synthetic minority over-sampling technique combined with edited nearest neighbors (SMOTEENN) to address class imbalance in datasets and to improve model performance and robustness. For both the AMH models and the full models, we used ten distinct machine learning algorithms to compare their performance. These algorithms included linear discrimination analysis, ridge regression, linear support vector classifier, logistic regression, multi-layer perceptron classifier, gaussian naïve bayes, random forest, K-neighbors classifier, extra trees, and gradient boosting. The model performance was assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC) and validated by bootstrap resampling (500 samples with replacement) [22]. The algorithm with the highest AUC (multi-layer perceptron classifier) was selected for further analysis.
In the development of the full model, to achieve a clinically applicable model with fewer features yet maintaining optimal performance, we employed recursive feature elimination (RFE) for feature selection. By keeping AMH as a constant feature, we systematically eliminated the lipoprotein subfractions. This approach was undertaken to evaluate the model’s performance with varying numbers of features, determining the optimal balance between feature count and model efficacy. Python (version 3.9.17) was employed for all machine learning tasks.
Statistical analysis
Gaussian distribution data were shown as the mean ± standard deviation. Abnormal distribution data were transformed by natural logarithmic, square root transformation or reciprocal and which achieved Gaussian distribution then were shown as the means and 95% CIs. Some data remained skewed distribution were shown with the median and the interquartile range. Continuous variables were compared with one-way ANOVA followed by the post hoc test or nonparametric test. Categorical variables were compared by the chi-square test. Spearman (for non-normalized data) and Pearson (for normalized data) correlation analysis was used to evaluate the correlation between lipoprotein subfractions measured by NMR spectroscopy and clinical methods. Two-sided p < 0.05 was considered statistically significant. All the clinical data were analyzed using SPSS 26.0.
Results
Comparison of general characteristics based on the recommended FNPO threshold in the new international guideline
The general characteristics among controls, low-FNPO, and high-FNPO PCOS cases were presented in Table 1.
In terms of the endocrine parameters, both low- and high-FNPO PCOS cases presented significantly higher level of LH, T, AMH, and DHEAS and significantly lower level of FSH than controls. Compared with low-FNPO PCOS cases, the high-FNPO PCOS cases exhibited much severer endocrine hormone disorder.
As for the metabolic parameters, both low- and high-FNPO PCOS cases showed significantly higher insulin level and HOMA-IR. Furthermore, the high-FNPO PCOS cases displayed significantly higher level of not only insulin and HOMA-IR, but also TG, CH, and LDL-C than low-FNPO PCOS cases.
Taken together, the high-FNPO PCOS cases were characterized by the severer endocrine and metabolic dysfunction phenotype.
The lipoprotein subfractions between low- and high-FNPO PCOS cases
To eliminate the effect of age and BMI on lipoprotein subfractions, we selected the age- and BMI-matched controls and PCOS case as the subcohort to measure the lipoprotein subfractions. The anthropometric, ultrasonographic, and laboratory parameters of this subcohort were listed as Additional file 1: Table S2. Similarly, the high-FNPO PCOS cases in the subcohort presented worse endocrine profiles. However, the metabolic parameters showed no difference in the high-FNPO PCOS cases, except for lower glucose level.
Since there was an overlap of metabolic parameters measured by NMR spectroscopy and clinical data, correlation analysis was first performed to validate the consistency between two methods. The scatter plot showed the close correlation (R2 ranged from 0.40 to 0.56, Additional file 2: Fig. S1), suggesting the high reliability of NMR data.
Firstly, we compared PCOS cases and controls. The PLS-DA scores plot of NMR data showed a slight discrimination between control and PCOS (component 1 14.3% and component 2 17.9%, Fig. 1B). The top important variables (VIP > 2) contributed to the discrimination between control and PCOS included L5PL, L5PN, L5AB, L5CH, V4TG, L1FC, L5FC, and V5TG (Fig. 1C).
Multivariate analysis of lipoprotein subfractions. A The flow chart of the study. B PLS-DA scores plot of controls and PCOS cases. C Variable importance in projection (VIP) scores of PLS-DA of controls and PCOS cases. D PLS-DA scores plot of controls, low-FNPO, and high-FNPO PCOS cases. E VIP scores of PLS-DA of controls, low-FNPO, and high-FNPO PCOS cases
Subsequently we focused on the low- and high-FNPO PCOS. Similarly, the PLS-DA scores plot showed a clear discrimination among controls, low-FNPO, and high-FNPO PCOS (component 1 28.3% and component 2 18.1%, Fig. 1D). The top important variables (VIP > 2) contributed to the discrimination among these three groups included V4TG, V3TG, and V4PL (Fig. 1E).
To further explore the association between lipoprotein subfractions and the risk of high-FNPO PCOS, logistic regression analysis was performed (Fig. 2 and Additional file 2: Figs. S2–S4). V2TG, V3TG, V2CH, V3CH, and V3PL increased the risk of high-FNPO PCOS. The OR per SD increment of them were 1.59, 1.80, 1.62, 1.67, and 1.65 separately (Fig. 2).
Associations of VLDL lipoprotein subfractions with risk of high-FNPO PCOS. OR (with 95% CIs) was presented per SD higher lipoprotein subfractions and given by multinomial logistic regression analysis. VLDL was divided into five subfractions VLDL-1, 2, 3, 4, 5 (V1, V2, V3, V4, and V5), numbering according to increasing density. Abbreviations: OR, odds ratio; CI, confidence interval; VLDL, very low-density lipoprotein; TG, triglycerides; CH, cholesterol; FC, free cholesterol, PL, phospholipid
Based on the variables with VIP score > 2 in PLS-DA or p < 0.05 in logistic regression analysis, seven key lipoprotein subfractions were selected, including V2TG, V3TG, V4TG, V2CH, V3CH, V3PL, and V4PL. Among of them, the concentration of V3TG in high-FNPO PCOS cases was significantly higher than those in controls (p = 0.049) and low-FNPO PCOS cases (p = 0.036, Additional file 2: Fig. S5A). Additionally, the other six lipoprotein subfractions also presented an increased trend in high-FNPO PCOS cases (Additional file 2: Fig. S5B–G).
Machine learning prediction model for high-FNPO PCOS cases and internal validation
Considering that the new guidelines have proposed including AMH as a diagnostic criterion to replace ultrasound-based PCOM, we initially constructed the AMH models using the AMH marker to predict high-FNPO PCOS cases using ten machine learning algorithms. The AUCs of AMH models with different algorithms were between 0.522 and 0.750 (Additional file 2: Fig. S6A).
Furthermore, we incorporated the above-mentioned seven key lipoprotein subfractions (V2TG, V3TG, V4TG, V2CH, V3CH, V3PL, and V4PL) into the AMH models as the full models. The full models constructed by the seven key lipoprotein subfractions and AMH achieved a better discrimination, with AUCs between 0.695 and 0.874 under ten algorithms (Additional file 2: Fig. S6B and Fig. 3A). Among of them, the full model established using multi-layer perceptron classifier exhibited the optimum performance (AUC = 0.874, Fig. 3A). The bootstrapped results showed high reproducibility in both AMH and full models (Additional file 2: Fig. S7).
To obtain a simpler and easily applicable “concise model” with a higher AUC, the recursive feature elimination with logistic regression was implemented and the results revealed the increasing AUCs along with more lipoprotein subfractions added to the AMH model (Additional file 2: Fig. S8A). Even if only one lipoprotein subfractions (V3TG) was added into the AMH model, the AUC could increase to 0.807 (Fig. 3B). Bootstrapped results showed similar AUC (0.808 ± 0.050, Additional file 2: Fig. S8B).
Discussion
In this study, we uncovered the worse metabolic parameters in the high-FNPO PCOS cases under the new international guideline. Furthermore, the detailed lipoprotein subfractions between low- and high-FNPO PCOS cases were investigated. Seven key lipoprotein subfractions were identified based on PLS-DA and logistic association analysis. Machine learning prediction model development suggested that the addition of key lipoprotein subfractions, especially for V3TG, into the AMH model might improve its discrimination ability for high-FNPO PCOS cases.
PCOS is a complex endocrine and metabolic disorder in reproductive-age women, and PCOM is one of diagnostic terms. Due to the improvement of ultrasound technology, it has been controversial that the FNPO ≥ 12 was defined as PCOM. Therefore, the new international guideline suggested FNPO of 20 as the cutoff. At present, several studies have explored the impact of the stricter PCOM threshold from different aspects, such as diagnostic status for PCOS adult women [23], AR expression in granulosa cells [24], AR CAG length in serum [25], and blood DNA methylation [10]. However, there is still a gap in the field of metabolic profile under the stricter PCOM threshold.
Various studies have presented that metabolic dysregulation involved in the development of PCOM. Animal experiments validated that high-fat diet could induce the atretic and cystic follicles [26,27,28] and influence the follicle development [29], which were similar to the characteristic of PCOS. Human studies have reported that small follicles (5–8 mm) were positively correlated with these markers of metabolic dysfunction [30] and PCOM was associated with insulin resistance [31]. Consistent with those studies, our study uncovered the adverse metabolic parameters in high-FNPO PCOS cases.
Our study further concentrated on the alteration of lipid metabolism between low- and high-FNPO PCOS cases and investigated various lipoprotein subfractions simultaneously by NMR spectroscopy, trying to provide more evidence for metabolic biomarkers and potential mechanisms of PCOM based on the new guideline. Previous study has compared the metabolic parameter and showed higher TG and CH level in high-FNPO PCOS cases [9]. However, the results might be influenced by higher BMI of high-FNPO PCOS cases and were limited by the fewer clinical measurements. Our study eliminated the impact of age and BMI and performed the measurement of lipoprotein subfractions in PCOS to deeply explore the minimal but crucial metabolic change.
VLDL, contented with TG, CH, CH ester, and PL, was secreted into circulation from liver [32, 33]. Our results not only supported the previous studies showing increased VLDL concentration in PCOS cases [18, 34], but also added more information related with the VLDL subfraction in PCOM diagnosed by the new international guideline. First, higher TG, CH, and PL in VLDL were observed in high-FNPO POCS cases compared with age- and BMI-matched controls and low-FNPO PCOS cases. Second, we firstly proposed that TG, CH, and PL in VLDL contributed to the classification between low- and high-FNPO PCOS cases, suggesting that the dysregulation of VLDL subfractions might participate the development of PCOM by a currently unknown way. Notably, both PLS-DA and logistic regression analysis uncovered V3TG, suggesting the important role of TG in medium VLDL particles. This implies that VLDL, as well as VLDL subfraction, should be focused in the diagnosis of high-FNPO PCOS.
The underlying mechanism for the abnormal VLDL subfractions in high-FNPO PCOS women is not well understood. Considering that the testosterone levels were positively associated with PL, TG, and CH in larger VLDL subtractions, and the concentrations and mean diameters of larger VLDL subfractions [18], we have a reasonable suspicion that the profile with dysregulation of VLDL subfractions is resulted from the hyperandrogenism. A randomized crossover study shows that high physiological testosterone could increase hepatic TG synthesis and VLDL-TG secretion [35], which may be activated by AMP-activated protein kinase–dependent pathway or fluctuations of hepatocellular Ca2+ concentrations. As we known that androgen excess enhances follicle development and dysfunctional formation of antral follicles leading to PCOS [36], it is sound to suspect that androgen might be the potential confounders between abnormal VLDL subfractions and PCOM.
Additionally, PCOS women are at high risk to develop several metabolic disorders in the long-term management, such as cardiovascular disease, type 2 diabetes, and metabolic associated fatty liver disease [4, 37, 38]. Previous studies reported that insulin resistance, abdominal obesity, and hyperandrogenism were all contributed to the lipoprotein profiles in PCOS [19]. Coincidentally, several studies have reported that dysfunctional metabolism of VLDL, particular for the TG in VLDL, was positively associated with the risk of these metabolic complication of PCOS [39,40,41]. Thus, metabolic diseases may also be mediators of VLDL subfraction and PCOM. The specific biological mechanisms involved remain to be confirmed by further experiments.
AMH, belonging to transforming growth factor beta family, is a polypeptide secreted by granulosa cells [42, 43]. The levels of AMH in PCOS are significantly higher than control women and strongly associated with the antral follicle counts [44]. Due to the challenges of ultrasound in diagnosing PCOM, several studies and new guideline proposed that serum AMH could be a substitute way of ultrasound [45, 46]. However, the diagnostic accuracy remains to be improved [47]. To address the key issue, a group of key lipoprotein subfractions were added in the AMH model to predict PCOM under the new guideline, showing increased model performance. To obtain a concise model, V3TG was added in the AMH model, which also improved the predictive ability. Our current results found the potential ability of key lipoprotein subfractions, particular for V3TG, to predict PCOM. The model of AMH and V3TG simultaneously reflect the endocrine and metabolic dysfunction, which might be a more perfect predictive model for PCOM based on the new guideline.
Predicting polycystic ovarian morphology with appropriate plasma markers instead of ultrasonography is currently expected in PCOS clinical application. Using plasma measures is much more convenient than ultrasound examination as most of the PCOS women received blood test in clinics. In addition, plasma measures could provide a better way for adolescent PCOS and women who are not sexually active, whom may not be able to get better accuracy with transabdominal ultrasound, and transvaginal ultrasound is painful. Also, we acknowledge that there is a long way to go to promote the measurement of plasma lipoprotein fractions in the clinic. After all, both the instrumentation and the cost of the test need to be evaluated in depth for effectiveness.
However, the quantification of these markers using NMR spectroscopy remains limited to research settings and has not yet been implemented in routine clinical practice. Consequently, the associated costs are not well-established, and a formal cost-effectiveness analysis could not be conducted in this study. Future research should evaluate the economic feasibility of incorporating lipoprotein subfractions into clinical workflows by computing the incremental cost-effectiveness ratio, that is, the comparison of the costs and the improved diagnostic accuracy from the model with AMH only to our diagnostic model (AMH and lipoprotein subfractions). With technological advancements and broader adoption, the implementation of a fast and low-cost method for VLDL subfraction determination remains an unmet challenge. Until then, the present study primarily aims to establish the scientific and diagnostic value of these markers, which serves as a foundation for future cost-effectiveness studies.
This study has some limitations and needs a series of future work. Due to the relatively smaller sample size for lipoprotein subfractions measurement, we employed bootstrap resampling to internally validate the performance of the proposed diagnostic model. Bootstrap validation is a widely used method for optimizing models and generating unbiased estimates of performance metrics, particularly when external cohorts are unavailable. However, external validation in independent and diverse patient populations remains an important next step to establish the model’s generalizability and clinical utility. Future research should focus on validating the model in larger-scale and multi-center cohorts to ensure its broader applicability. In addition, a post hoc power analysis indicated that the sample size in the current study may be insufficient to detect small-to-moderate effect sizes for certain lipoprotein subfractions with high statistical power. This limitation highlights the exploratory nature of the study and underscores the need for larger and more diverse cohorts in future research. Despite this limitation, the performance of machine learning models, supported by bootstrap resampling, suggests that the identified associations are robust and provides a foundation for future investigations.
Conclusions
In summary, this study comprehensively and systematically provided the clinical characteristic of low- and high-FNPO PCOS cases and the exact profile of lipoprotein subfractions. We identified the distinct and associated lipoprotein subfractions in high-FNPO PCOS cases. Subsequently, we developed improved models with these lipoprotein subfractions to predict PCOM under the new guideline, although the generalizability and clinical utility of the model still needs to be externally validated in an independent and diverse population. Collectively, our study greatly enriches the understanding of PCOM from the metabolic aspect, providing the insight into the effects of lipoprotein subfractions on the PCOM.
Data availability
All data generated or analyzed during this study are included in this published article.
Abbreviations
- PCOS:
-
Polycystic ovary syndrome
- PCOM:
-
Polycystic ovarian morphology
- FNPO:
-
Follicle number per ovary
- PLS-DA:
-
Partial least squares discriminant analysis
- VIP:
-
Variable importance in projection
- AMH:
-
Anti-Müllerian hormone
- NMR:
-
Nuclear magnetic resonance
- V(1–5)TG:
-
Triglyceride subfractions in very-low-density lipoprotein (numbering according to increasing density)
- V(1–5)CH:
-
Cholesterol subfractions in very-low-density lipoprotein (numbering according to increasing density)
- V(1–5)PL:
-
Phospholipid subfractions in very-low-density lipoprotein (numbering according to increasing density)
References
Joham AE, Norman RJ, Stener-Victorin E, Legro RS, Franks S, Moran LJ, et al. Polycystic ovary syndrome. Lancet Diabetes Endocrinol. 2022;10(9):668–80.
Diamanti-Kandarakis E, Dunaif A. Insulin resistance and the polycystic ovary syndrome revisited: an update on mechanisms and implications. Endocr Rev. 2012;33(6):981-1030.
Kakoly NS, Khomami MB, Joham AE, Cooray SD, Misso ML, Norman RJ, et al. Ethnicity, obesity and the prevalence of impaired glucose tolerance and type 2 diabetes in PCOS: a systematic review and meta-regression. Hum Reprod Update. 2018;24(4):455–67.
Wekker V, van Dammen L, Koning A, Heida KY, Painter RC, Limpens J, et al. Long-term cardiometabolic disease risk in women with PCOS: a systematic review and meta-analysis. Hum Reprod Update. 2020;26(6):942–60.
Diamanti-Kandarakis E, Papavassiliou AG, Kandarakis SA, Chrousos GP. Pathophysiology and types of dyslipidemia in PCOS. Trends Endocrinol Metab. 2007;18(7):280–5.
Balen AH, Laven JSE, Tan S-L, Dewailly D. Ultrasound assessment of the polycystic ovary: international consensus definitions. Hum Reprod Update. 2003;9(6):505–14.
Dewailly D, Lujan ME, Carmina E, Cedars MI, Laven J, Norman RJ, et al. Definition and significance of polycystic ovarian morphology: a task force report from the Androgen Excess and Polycystic Ovary Syndrome Society. Hum Reprod Update. 2014;20(3):334–52.
Teede HJ, Tay CT, Laven J, Dokras A, Moran LJ, Piltonen TT, et al. Recommendations from the 2023 international evidence-based guideline for the assessment and management of polycystic ovary syndrome. Fertil Steril. 2023;120(4):767–93.
Kostroun KE, Goldrick K, Mondshine JN, Robinson RD, Mankus E, Reddy S, et al. Impact of updated international diagnostic criteria for the diagnosis of polycystic ovary syndrome. F S Rep. 2023;4(2):173–8.
Wang Y, Gao X, Yang Z, Yan X, He X, Guo T, et al. Deciphering the DNA methylome in women with PCOS diagnosed using the new international evidence-based guidelines. Hum Reprod. 2023;38(Supplement_2):ii69–79.
Wild RA, Painter PC, Coulson PB, Carruth KB, Ranney GB. Lipoprotein lipid concentrations and cardiovascular risk in women with polycystic ovary syndrome. J Clin Endocrinol Metab. 1985;61(5):946–51.
Niu Z, Wu Q, Luo Y, Wang D, Zheng H, Wu Y, et al. Plasma lipidomic subclasses and risk of hypertension in middle-aged and elderly Chinese. Phenomics. 2022;2(5):283–94.
Gronenborn AM, Polenova T. Introduction: biomolecular NMR spectroscopy. Chem Rev. 2022;122(10):9265–6.
Pérez-Trujillo M, Athersuch TJ. Special issue: NMR-based metabolomics. Molecules. 2021;26(11):3283.
Nagana Gowda GA, Raftery D. NMR-based metabolomics. Adv Exp Med Biol. 2021;1280:19–37.
Nagana Gowda GA, Raftery D. NMR metabolomics methods for investigating disease. Anal Chem. 2023;95(1):83–99.
Bragg F, Trichia E, Aguilar-Ramirez D, Besevic J, Lewington S, Emberson J. Predictive value of circulating NMR metabolic biomarkers for type 2 diabetes risk in the UK Biobank study. BMC Med. 2022;20(1):159.
Couto Alves A, Valcarcel B, Mäkinen VP, Morin-Papunen L, Sebert S, Kangas AJ, et al. Metabolic profiling of polycystic ovary syndrome reveals interactions with abdominal obesity. Int J Obes (Lond). 2017;41(9):1331–40.
Gourgari E, Lodish M, Shamburek R, Keil M, Wesley R, Walter M, et al. Lipoprotein particles in adolescents and young women with PCOS provide insights into their cardiovascular risk. J Clin Endocrinol Metab. 2015;100(11):4291–8.
Tahapary DL, Pratisthita LB, Fitri NA, Marcella C, Wafa S, Kurniawan F, et al. Challenges in the diagnosis of insulin resistance: focusing on the role of HOMA-IR and tryglyceride/glucose index. Diabetes Metab Syndr. 2022;16(8): 102581.
Pang Z, Chong J, Zhou G, de Lima Morais DA, Chang L, Barrette M, et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res. 2021;49(W1):W388–96.
Steyerberg EW, Harrell FE, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774–81.
Kim JJ, Hwang KR, Chae SJ, Yoon SH, Choi YM. Impact of the newly recommended antral follicle count cutoff for polycystic ovary in adult women with polycystic ovary syndrome. Hum Reprod. 2020;35(3):652–9.
Gao X-Y, Liu Y, Lv Y, Huang T, Lu G, Liu H-B, et al. Role of androgen receptor for reconsidering the “true” polycystic ovarian morphology in PCOS. Sci Rep. 2020;10(1):8993.
Yan X, Gao X, Shang Q, Yang Z, Wang Y, Liu L, et al. Investigation of androgen receptor CAG repeats length in polycystic ovary syndrome diagnosed using the new international evidence-based guideline. J Ovarian Res. 2023;16(1):211.
Roberts JS, Perets RA, Sarfert KS, Bowman JJ, Ozark PA, Whitworth GB, et al. High-fat high-sugar diet induces polycystic ovary syndrome in a rodent model. Biol Reprod. 2017;96(3):551–62.
Niño OMS, da Costa CS, Torres KM, Zanol JF, Freitas-Lima LC, Miranda-Alves L, et al. High-refined carbohydrate diet leads to polycystic ovary syndrome-like features and reduced ovarian reserve in female rats. Toxicol Lett. 2020;332:42–55.
Begum N, Manipriya K, Veeresh B. Role of high-fat diet on letrozole-induced polycystic ovarian syndrome in rats. Eur J Pharmacol. 2022;917: 174746.
Jiang Y, Gao X, Liu Y, Yan X, Shi H, Zhao R, et al. Cellular atlases of ovarian microenvironment alterations by diet and genetically-induced obesity. Sci China Life Sci. 2024;67(1):51-66.
Christ JP, Vanden Brink H, Brooks ED, Pierson RA, Chizen DR, Lujan ME. Ultrasound features of polycystic ovaries relate to degree of reproductive and metabolic disturbance in polycystic ovary syndrome. Fertil Steril. 2015;103(3):787–94.
Hong S-H, Sung Y-A, Hong YS, Jeong K, Chung H, Lee H. Polycystic ovary morphology is associated with insulin resistance in women with polycystic ovary syndrome. Clin Endocrinol (Oxf). 2017;87(4):375–80.
Huang J-K, Lee H-C. Emerging evidence of pathological roles of very-low-density lipoprotein (VLDL). Int J Mol Sci. 2022;23(8):4300.
Feingold KR. Lipid and lipoprotein metabolism. Endocrinol Metab Clin North Am. 2022;51(3):437–58.
Sravan Kumar P, Ananthanarayanan PH, Rajendiran S. Cardiovascular risk markers and thyroid status in young Indian women with polycystic ovarian syndrome: a case-control study. J Obstet Gynaecol Res. 2014;40(5):1361–7.
Host C, Gormsen LC, Christensen B, Jessen N, Hougaard DM, Christiansen JS, et al. Independent effects of testosterone on lipid oxidation and VLDL-TG production: a randomized, double-blind, placebo-controlled, crossover study. Diabetes. 2013;62(5):1409–16.
Prizant H, Gleicher N, Sen A. Androgen actions in the ovary: balance is key. J Endocrinol. 2014;222(3):R141–51.
Osibogun O, Ogunmoroti O, Michos ED. Polycystic ovary syndrome and cardiometabolic risk: opportunities for cardiovascular disease prevention. Trends Cardiovasc Med. 2020;30(7):399–404.
Sanchez-Garrido MA, Tena-Sempere M. Metabolic dysfunction in polycystic ovary syndrome: pathogenic role of androgen excess and potential therapeutic strategies. Mol Metab. 2020;35: 100937.
Sørensen LP, Andersen IR, Søndergaard E, Gormsen LC, Schmitz O, Christiansen JS, et al. Basal and insulin mediated VLDL-triglyceride kinetics in type 2 diabetic men. Diabetes. 2011;60(1):88–96.
Abi-Ayad M, Abbou A, Abi-Ayad FZ, Behadada O, Benyoucef M. HDL-C, ApoA1 and VLDL-TG as biomarkers for the carotid plaque presence in patients with metabolic syndrome. Diabetes Metab Syndr. 2018;12(2):175–9.
Lin H, Wang L, Liu Z, Long K, Kong M, Ye D, et al. Hepatic MDM2 causes metabolic associated fatty liver disease by blocking triglyceride-VLDL secretion via ApoB degradation. Adv Sci (Weinh). 2022;9(20): e2200742.
di Clemente N, Racine C, Pierre A, Taieb J. Anti-Müllerian hormone in female reproduction. Endocr Rev. 2021;42(6):753–82.
Dewailly D, Andersen CY, Balen A, Broekmans F, Dilaver N, Fanchin R, et al. The physiology and clinical utility of anti-Mullerian hormone in women. Hum Reprod Update. 2014;20(3):370–85.
Cook CL, Siow Y, Brenner AG, Fallat ME. Relationship between serum müllerian-inhibiting substance and other reproductive hormones in untreated women with polycystic ovary syndrome and normal women. Fertil Steril. 2002;77(1):141–6.
Eilertsen TB, Vanky E, Carlsen SM. Anti-Mullerian hormone in the diagnosis of polycystic ovary syndrome: can morphologic description be replaced? Hum Reprod. 2012;27(8):2494–502.
Fraissinet A, Robin G, Pigny P, Lefebvre T, Catteau-Jonard S, Dewailly D. Use of the serum anti-Müllerian hormone assay as a surrogate for polycystic ovarian morphology: impact on diagnosis and phenotypic classification of polycystic ovary syndrome. Human Reproduction (Oxford, England). 2017;32(8):1716–22.
Teede H, Misso M, Tassone EC, Dewailly D, Ng EH, Azziz R, et al. Anti-Müllerian hormone in PCOS: a review informing international guidelines. Trends Endocrinol Metab. 2019;30(7):467–78.
Acknowledgements
We would like to thank all the participants who took part in the study. We thank the support of NUTRIEASE for this study.
Funding
This study was supported by the National Key Research and Development Program of China (2024YFC2707300, 2021YFC2700400), the Fundamental Research Funds of Shandong University (2023QNTD004), the National Natural Science Foundation of China (82421004, 32588201, 32370916, 82101707, and 82071606), the Natural Science Foundation of Shandong Province for Excellent Youth Scholars (ZR2023YQ061), CAMS Innovation Fund for Medical Sciences (2021-I2M-5–001), Shandong Provincial Key Research and Development Program (2024CXPT087), Ningxia Hui Autonomous Region Key Research and Developmental Program (2024BEG02019), the specific research fund of The Innovation Platform for Academicians of Hainan Province (YSPTZX202310), the Program for Chang Jiang Scholars (Q2022144), and the Taishan Scholars Program of Shandong Province (ts20190988).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Samples collection: G.F., and S.L.; Formal analysis: Y.L. and Y.S.; Data analysis: X.Y. and Z.Y.; Writing – original draft preparation: X.Y., Z.Y., and H.Z.; Writing – review and editing: X.G. and S.Z. Supervision: H.Z. and J.M.. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
All experimental protocols performed in studies involving human participants were in accordance with the ethical standards of the Ethics Committee of Shandong University and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. All experimental protocols were performed in accordance with relevant guidelines and regulations approved by the Institutional Review Board of Center for Reproductive Medicine, Shandong University (IRB2021-98). Written informed consent was obtained from each patient.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
12916_2025_4120_MOESM1_ESM.rar
Additional file 1. Table S1 Annotation of lipoprotein subfractions. Table S2 Anthropometric, ultrasonographic, and laboratory parameters of low- and high-FNPO PCOS and controls.
12916_2025_4120_MOESM2_ESM.rar
Additional file 2. Fig. S1 The scatterplots showed four traits measured by traditionally clinical method (x axis) and NMR spectroscopy (y axis). Fig. S2 Associations of LDL lipoprotein subfractions with risk of high-FNPO PCOS. Fig. S3 Associations of HDL lipoprotein subfractions with risk of high-FNPO PCOS. Fig. S4 Associations of other lipoprotein subfractions with risk of high-FNPO PCOS. Fig. S5 Univariate analysis of seven lipoprotein subfractions among controls, low-FNPO, and high-FNPO PCOS. Fig. S6 The ROC curve of model development on ten machine learning algorithms (multi-layer perceptron classifier, logistic regression, ridge regression, linear discrimination analysis, linear support vector classifier (SVC), gaussian naïve beyes (NB), random forest, K-neighbors classifier, extra trees, and gradient boosting). Fig. S7 The ROC curve of internal validation on ten machine learning algorithms (multi-layer perceptron classifier, logistic regression, ridge regression, linear discrimination analysis, linear support vector classifier (SVC), gaussian naïve beyes (NB), random forest, K-neighbors classifier, extra trees, and gradient boosting). Fig. S8 Feature selection and the validation of model predicted with AMH and V3TG.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yan, X., Yang, Z., Zhao, H. et al. Unveiling lipoprotein subfractions signature in high-FNPO PCOS: implications for PCOM diagnosis and risk assessment using advanced machine learning models. BMC Med 23, 289 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04120-z
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04120-z