Skip to main content

Machine learning technique-based four-autoantibody test for early detection of esophageal squamous cell carcinoma: a multicenter, retrospective study with a nested case–control study

Abstract

Background

Autoantibodies represent promising diagnostic blood-based biomarkers that may be generated prior to the first clinically detectable signs of cancers. In present study, we aimed to identify a novel optimized autoantibody panel with high diagnostic accuracy for clinical and preclinical esophageal squamous cell carcinoma (ESCC) using machine learning (ML) algorithms.

Methods

We identified potential autoantibodies against tumor-associated antigens with serological proteome analysis. Serum autoantibody levels were measured by ELISA. Using a training set (n = 531), 102 models based on ML algorithms were constructed, and Partial Least Squares Generalized Linear Models (plsRglm) was selected out using receiver operating characteristics (ROC), Kolmogorov–Smirnov (K-S) test, and Population Stability Index (PSI), and further validated through an internal validation set (n = 413), external validation set 1 (n = 371), and external validation set 2 (n = 202). Then, we validated the ability of plsRglm model in predicting preclinical ESCC by a nested case–control study (24 preclinical ESCCs and 112 matched controls) within a population-based prospective cohort study.

Results

ROC analysis, K-S test, and PSI showed that plsRglm model based on four autoantibodies (ALDOA, ENO1, p53, and NY-ESO-1) exhibited the better diagnostic performance and robustness, which provided a high diagnostic accuracy in diagnosing ESCC with the respective AUCs (sensitivities and specificities) of 0.860 (68.8% and 90.4%) in the training set, 0.826 (65.3% and 89.1%) in the internal validation set, and 0.851 (69.2% and 87.3%) in the external validation set 1. For early-stage ESCC, this signature also maintained diagnostic performance [0.817 (62.3% and 90.4%) in the training set; 0.842 (62.5% and 89.1%) in the internal validation set; 0.854 (63.2% and 87.3%) in the external validation set 1; and 0.850 (67.3% and 90.1%) in the external validation set 2]. In the nested case–control study, this plsRglm model could detect the presence of preclinical ESCC with the AUC of 0.723, sensitivity of 54.2%, and specificity of 86.6%.

Conclusions

Our findings indicated that the plsRglm model based on four autoantibodies might help identify preclinical and early-stage ESCC.

Peer Review reports

Background

As one of the most aggressive malignancies, esophageal cancer (EC) ranked the seventh most deadly cancer worldwide [1]. The incidence rates and the histopathological forms of EC vary by geographic locations and ethnicity [2]. In China, esophageal squamous cell carcinomas (ESCC) accounts for over 90% of all EC [3]. Despite improvements in the treatments, the overall survival of ESCC patients remains unsatisfied, especially when diagnosed at advanced stages. It is reported that the estimated 5-year survival rate of ESCC in China during 2000–2014 were less than 30% [4]. Therefore, the identification of reliable non-invasive biomarkers is urgently needed for the early detection of ESCC.

An increasing number of studies have indicated that serum autoantibodies to tumor-associated antigens (TAAs), as potent serum biomarkers, could be used for the diagnosis of early malignancy or pre-malignancy [5,6,7,8,9,10,11,12,13]. Moreover, the formation of a panel comprising several TAAs may be more essential to achieve the cancer diagnosis performance [6,7,8,9,10,11,12,13]. Until now, many efforts have been made to identify some autoantibody targets for EC [14,15,16,17,18,19]. However, only one study provided the evidence that autoantibodies could detect ESCC in the preclinical stage [20]. For early-stage ESCC, our previous study also reported that a panel of autoantibodies to six TAAs had significant diagnostic value [18]. In addition, this finding underscored that autoantibodies for ESCC screening and diagnosis seemed to rely on the construction of an optimized autoantibody panel. However, it should be noted that biomarkers with higher sensitivity will further help miss fewer cases. Therefore, it is a great need to identify another optimized autoantibody panel that could enhance the diagnostic efficiency for ESCC, and to initiate a multicenter study with prospective evidence to assess the diagnostic relevance of the autoantibody assay.

Through the past decade, artificial intelligence has contributed noticeably to cancer research, including diagnosis of endometrial, colorectal, and pancreatic cancers [21,22,23,24]. In esophageal cancer related research, increasing machine learning (ML) techniques were also applied in early detection [25,26,27]. Therefore, in our current study, after using a proteomic approach, we tended to identify a novel autoantibody signature comprising ALDOA, ENO1, p53, and NY-ESO- 1 using 102 ML techniques to detect early ESCC patients within a training set, and validated its diagnostic value in three independent validation sets. Moreover, to test the value of the selected ML model to identify preclinical disease, we conducted a nested case–control study, using prospectively collected sera from ESCC patients and matched controls. Finally, we found that the novel constructed ML model might be useful for post-treatment monitoring of ESCC.

Methods

Study design and participants

A multi-phase study including the discovery, training, and validation phases was conducted (Fig. 1). ESCC was defined as our previous study [18]. Patients with other pathological types of esophageal cancers (e.g., esophageal adenocarcinoma) were excluded. Normal controls in each center were those who had received medical check-up, had no evidence of malignancy, and never received any treatment of malignancy. The discovery phase was conducted using the serological proteome analysis (SERPA) in 8 ESCC samples and 8 normal controls obtained from the Cancer Hospital of Shantou University Medical College (SUMC) in February 2011. From July 2011 to July 2012, the blood samples of 388 ESCC patients and 125 normal controls from the Cancer Hospital of SUMC were consecutively collected as a training set for the identification of autoantibody diagnostic signature. The validation stage had four sets, including one internal validation set and three external validation sets. The internal validation set contained 312 ESCC patients and 101 normal controls from August 2012 to February 2014 at the Cancer Hospital of SUMC. Research subjects in the external validation sets 1 and 2 were obtained from Shantou Central Hospital and the Cancer Center of Sun Yat-sen University (SYSU), respectively. Moreover, the normal controls in the external validation set 2 were age- and sex-matched to the patients with early-stage ESCC. To further assess the detection ability of the constructed autoantibody signature in prediagnostic ESCC, we carried out a nested case–control study, serving as external validation set 3, based on a population-based prospective cohort study of esophageal cancer and precancerous lesions in high-risk areas (ClinicalTrials.gov Identifier: NCT02094105). Additional details regarding the cohort design and methods were published elsewhere [28]. In the external validation set 3, subjects whose pathological diagnosis was normal by endoscopic biopsy at baseline and then developed ESCC during the follow-up period with an archived baseline blood sample were selected as case group. For each ESCC case, we selected four (or five) sex-matched (male, 54.2% vs. 53.6%), age-matched (mean age at blood draw, 55.0 vs. 55.0 years), and date-of-blood-draw-matched (mean time, 2.4 vs. 2.6 years) participants whose baseline pathological diagnosis was normal but did not develop cancers when the ESCC case patient was diagnosed. Finally, we selected 24 ESCC cases and 112 matched controls into the external validation set 3. Approval for this study from the institutional ethics review committee at each study center and informed consent of all participants were obtained. This work was complied with principles of the Helsinki Declaration.

Fig. 1
figure 1

Study profile. ESCC, esophageal squamous cell carcinoma; ELISA, enzyme-linked immunosorbent assay; SUMC, Shantou University Medical College; SYSU, Sun Yat-sen University; CAMS & PUMC, Chinese Academy of Medical Sciences and Peking Union Medical College; AUC, area under the receiver operating characteristic curve; K-S value, Kolmogorov–Smirnov test; PSI, Population Stability Index

Peripheral blood samples from prediagnostic, newly diagnosed, postoperative patients and normal controls were collected and clotted for 30 min at room temperature before centrifugation for 10 min at 2500 g, and the isolated sera were then stored at − 80 °C until the experiment began. Tumor stage, determined from resected tumors, was defined according to the eighth edition of the American Joint Committee on Cancer (AJCC) Cancer Staging Manual [29]. As described in previous study, we classified ESCC with AJCC stage 0 + I + IIA as early-stage ESCC [18]. The characteristics of the study subjects were summarized in Table 1.

Table 1 Patient details and clinicopathological characteristics

Two-dimensional polyacrylamide gel electrophoresis and western blotting

We used SERPA for the discovery of TAAs generating autoantibodies as described previously (see Additional file 1: Additional methods for detail) [30, 31].

Protein identification

Protein identification was done as described previously [30]. Proteins of interest were manually picked from the gels, destained, dehydrated in acetonitrile, and further dried for 20 min at 37 °C. Digestion was performed overnight at 37 °C in 50 mM ammonium bicarbonate buffer with 20 ng/μL of trypsin. The tryptic peptide mixture was then extracted in 60 μL of 90.0% acetonitrile/2.5% trifluoroacetic acid and the combined extracts were lyophilized at − 20 °C until the mass spectrometry analysis. Digested peptides were then processed for identification on the basis of their mass fingerprint obtained using an Ultraflex MALDI-TOF/TOF mass spectrometer (Bruker Daltonics Inc.). We identified proteins by using Mascot software (http://www.matrixscience.com) based on the entire NCBI and SwissProt protein databases.

Expression and purification of recombinant TAAs

The coding sequence regions for p53 (NM_001276760.1), NY-ESO- 1 (NM_001327.2), ALDOA (NM_000034), ENO1 (NM_001428), TPI1 (NM_000365), GAPDH (NM_001256799), Prx VI (NM_004905.2), Bmi- 1 (NM_005180.8), MMP7 (NM_002423.3), and Hsp70 (NM_005345.5) were subcloned into the pDEST17 expression vector (Invitrogen, cat. no. 11803–012). The recombinant proteins were expressed, purified, and analyzed as previously described [9, 18].

Enzyme-linked immunosorbent assay (ELISA)

ELISA for autoantibodies and experimental data analysis and processing were performed by two researchers (Yi-Wei Xu and Yu-Hui Peng) who were blinded to the status of the samples as previously described (see Additional file 1: Additional methods for detail) [9, 18]. Data were unblinded by other investigators in each study center. All cancer and normal samples were assayed in duplicate.

The construction and evaluation of machine learning models

A stepwise logistic regression using the selection criteria as entry probability of 0.05 and removal probability of 0.10 was applied to select an optimized panel of autoantibodies by combining the identified data, for autoantibodies against ALDOA, ENO1, TPI1, and GAPDH with previously generated autoantibody data to Hsp70, MMP7, Prx VI, p53, NY-ESO- 1, and Bmi- 1 based on the training set [18]. After selecting significant biomarkers through stepwise logistic regression, 102 models from caret package of R project were used for model construction in the training set. For evaluating the diagnostic performances of individual autoantibodies and ML models of combined autoantibody signatures in all datasets, we performed receiver operating characteristic (ROC) analysis to acquire area under the ROC curve (AUC). As a review conveyed the message that an AUC of 0.700–0.800 suggested considered acceptable [32], we filtered the AUCs of over 0.700 in all datasets for selecting the better models for diagnosis of ESCC. Kolmogorov–Smirnov (K-S) test and Population Stability Index (PSI) were used in evaluating the performance and robustness, respectively [33, 34]. K-S value over 0.2 means that the model has the discrimination ability, and PSI lower than 0.1 means that the model is highly robust. The higher K-S value and lower PSI together mean the better discrimination ability and robustness. The cut-off value of positive reactivity was defined in the training set by achieving the maximum sensitivity at the specificity over 90% and by minimizing the space interval of the point to the top-left corner in the ROC curve. We selected a specificity of over 90% for a test that would be adequate for the early detection purpose, which would be economically viable [35]. Calibration curves were created using the margin effect and the average prediction probability of the ML model. Decision curve analysis (DCA) was applied to evaluate the net benefit of ML model prediction and actual discrimination.

Statistical analysis

Based on data from preliminary experiments, we calculated that at least 81 patients and 35 normal controls were needed in each retrospective study set in order to detect an assumed sensitivity of 70% and specificity of 90% (two-sided 95% confidence interval (normal approximation)) [36]. Thus, the numbers of ESCC patients and controls in retrospective sets all met the sample size target. To improve the power of the nested case–control study, we increased the number of matched controls approaching a ratio of about 5/1, which does not lead to bias [37].

The models we applied in this research were derived from the caret package in R project, which provided available models in train function with different tuning parameters. Models with classification type were selected to train the available ML models in the training set. The parameters of ML models were automatedly tuning without any limitation in current research. As there were no blank data, the data pre-processing like missing data handling was skipped and the model training was based on the original data. We used the Mann–Whitney U test to compare the serum levels of individual autoantibodies between normal controls and ESCC patients. The Wilcoxon signed-ranks test was used to compare the serum autoantibodies levels in ESCC patients before and after surgical resection. To test the clinically relevant significance of individual and combined antibody assay positivity, we carried out chi-squared tests or Fisher’s exact tests. All statistical analyses were conducted by GraphPad Prism software, SPSS, and R project. Calibration curves and DCA curves were plotted using the runway and dcurves package, respectively. For convenient use, the shiny package was used to develop a shiny server supported by the open-source server platform “shinyapps.io.” To control the rate of type 1 error, we applied multiple testing corrections with the use of Bonferroni method (0.05/number of corrections). All P values were two-sided.

Results

Serological proteome analysis for identification of autoantibodies

In the discovery phase, sera from 8 ESCC patients and 8 matched controls were screened individually to investigate the presence of autoantibodies against proteins from human ESCC Eca109 cells. Proteins of reactive spot numbers 1, 2, 5, 6, and 7 were observed in 2 of 8 ESCC patients, and spot numbers 3 and 4 were observed in 3 of 8 ESCC patients, whereas no such reaction was observed with 8 normal controls (Additional file 2: Fig. S1). As shown in Additional file 3: Table S1, a total of seven protein spots were selected and identified as ALDOA, ENO1, TPI1, and GAPDH. Intriguingly, these four proteins were all listed in the glycolysis gene set.

Establishment of a serum autoantibody diagnostic signature

The levels of serum autoantibodies against ALDOA, ENO1, TPI1, and GAPDH were significantly higher in ESCC patients compared to those in normal controls in the training set (all P < 0.0001, Fig. 2, Additional file 4: Fig. S2a). We re-analyzed the previously published data regarding autoantibodies to p53, NY-ESO- 1, MMP7, Hsp70, Prx VI, and Bmi- 1 from the same sample set. We then performed a stepwise logistic regression analysis to select a subgroup of “informative” autoantibodies from the training set, identified four autoantibodies (ALDOA, ENO1, p53, and NY-ESO- 1), and derived a formula to calculate a variable predicted probability (p) for ESCC from the expression values of the 4 autoantibodies: ln(p/(1 − p)) = 4.883 × (ALDOA) + 3.378 × (ENO1) + 8.070 × (p53) + 3.516 × (NY-ESO- 1) − 2.657.

Fig. 2
figure 2

Violin plots of optical density (OD) values of autoantibodies to ALDOA, ENO1, p53, and NY-ESO- 1 from esophageal squamous cell carcinoma (ESCC) patients and normal controls

Then, 102 ML models from caret package were built using these four autoantibodies in the training set. The diagnostic performances were evaluated by using ROC analysis, and the parameters and AUCs of these 102 ML models in the training set and other four validation sets were exhibited in Additional file 5: Table S2. It could be found that twenty ML models acted as an outstanding classifier in the training set with the AUC over 0.900. However, their robustness were poor in the validation sets whose AUCs ranged from 0.590 to 0.844. After filtered the AUCs of over 0.700 in all datasets, 43 ML models were remaining, and it was astonishing that except the external validation set 3 and the Robust Linear Discriminant Analysis (Linda) model, all the AUCs reached to over 0.800 in the other four datasets, which meant these 42 ML models performed excellently in diagnosing ESCC.

Considering the retrospective nature of the external validation set 3 and for the purpose of diagnostic power and robustness evaluation, we applied ROC analysis, K-S test, and PSI in the other four datasets, including the training set, the internal validation set, and external validation sets 1 and 2. For the convenience of calculation, three validation sets were combined as a new and large-sample-size validation set with 650 ESCC patients and 336 normal volunteers, namely the combined validation set. As a result, the K-S values were all more than 0.2 in the training and combined validation set, indicating that the risk differentiation ability of the ML models was all strong (Additional file 6: Table S3). Among them, except Rotation Forest and Boosted Tree models, the PSI of other 40 ML models were lower than 0.1, which meant these models were highly robust. In order to screen an optimized ML model for further analysis, we considered to select the one with highest K-S value among those with top five lowest PSI value (the AUC, sensitivity, and specificity of these five models were showed in Additional file 7: Table S4). As a result, Partial Least Squares Generalized Linear Model (plsRglm) was selected out with comprehensive assessment.

As shown in Fig. 3 and Table 2, after internal cross-validation, the AUC of the plsRglm model in the training set was 0.860 (95% CI: 0.828–0.891). With a cut-off point of 0.214 (the point which lower than 0.214 meant positive), in the training set, the plsRglm model offered the sensitivity of 68.8% and specificity of 90.4%. When applied in evaluating the diagnostic performance in 53 early-stage ESCC (with pTNM 0/I/IIA) in the training set (Fig. 3), the plsRglm model got the AUC of 0.817 with 56.6% sensitivity and 90.4% specificity in this group of patients. The plsRglm model could discriminate ESCC, including the early stage, from normal controls with greater AUCs, sensitivities, and specificities compared with individual autoantibody assays.

Fig. 3
figure 3

Receiver operating characteristic (ROC) curve analysis of the ML models based on four-autoantibody signature for the diagnosis of esophageal squamous cell carcinoma (ESCC) and early-stage ESCC

Table 2 Measurement of the plsRglm model based on the four-autoantibody signature in the diagnosis of ESCC and preclinical ESCC

The ROC curves for individual autoantibodies were also shown in Supplementary Figs. 2B and 3. In the training set, ROC curves showed that the AUC values of individual autoantibodies were varied from 0.581 (Bmi- 1) to 0.773 (ALDOA), with sensitivity values ranging between 18.3% (Hsp70) and 46.9% (ALDOA, Supplementary Tables 5–6). We noted that diagnostic performance for early-stage ESCC patients were similar to those for all of the ESCC patients in the training set (Additional file 4: Fig. S2b, Additional file 8: Fig. S3, and Additional file 9: Table S5). Test performance by cancer stages for the plsRglm model and individual autoantibodies was shown in Additional file 10: Table S6 and Additional file 11: Table S7. Likelihood ratios for assessing diagnosis value of individual autoantibodies and plsRglm model were demonstrated in Table 2 and Additional files 9–11: Tables S5–S7.

Verification of the plsRglm models based on four autoantibodies

Individual autoantibodies against ALDOA, ENO1, p53, and NY-ESO- 1 were further evaluated in three validation sets and showed to be significantly elevated in patients with ESCC, compared to the normal controls (Fig. 2; all P < 0.0001). We then applied the plsRglm model derived from the training set to assess the risk of being diagnosed with ESCC for three retrospective validation sets. Similar results observed in these sets confirmed that the plsRglm model was of diagnostic value for ESCC (Fig. 3 and Table 2). The DCA curve and calibration curve showed that the plsRglm model could predict ESCC from normal control (Additional file 12: Fig. S4). Importantly, the power of the plsRglm model for diagnosing early-stage ESCC was also corroborated (Fig. 3 and Table 2).

Performance of the plsRglm model to detect preclinical ESCC

In the nested case–control study (external validation set 3), the levels of autoantibodies against ALDOA, ENO1, p53, and NY-ESO- 1 were all significantly higher in prediagnostic sera from ESCC patients compared to those in matched normal controls (Fig. 4; all P < 0.05). As shown in Table 2, importantly, the plsRglm model combining four autoantibody signatures had the best AUC of 0.723 (95% CI: 0.611–0.834) to discriminate prediagnostic ESCC individuals from the match normal controls (Fig. 4 and Table 2). When setting the cutoff value in prediagnostic sera as 0.214, the sensitivity/specificity was 54.2%/86.6%. Using 0.214 as the boundary of positive results, among 24 ESCC cases, 13 with positive results were detected dating back from 6 months to 6 years before clinical diagnosis (Table 3).

Fig. 4
figure 4

Performance of the four-autoantibody signature in the preclinical esophageal squamous cell carcinoma (ESCC). a Violin plots of optical density (OD) values of autoantibodies to ALDOA, ENO1, p53, and NY-ESO- 1 from preclinical esophageal squamous cell carcinoma (ESCC) patients and matched normal controls. b Receiver operating characteristic (ROC) analysis for diagnostic performance of the four-autoantibody signature to distinguish individuals with ESCC from matched controls at 6 months to 6 years before clinical diagnosis

Table 3 Results of plsRglm model for individual prediagnostic ESCC patients

Diagnostic assessment of the plsRglm model by using a higher assay specificity

When increasing the specificity to more than 95% with ROC analysis in the training set using the plsRglm model, we applied another cut-off value of 0.153 and found that the sensitivities of plsRglm model remained high for the diagnosis of ESCC in the training set and four validation sets (Table 2 and Additional file 11: Table S7). These consequences suggested the robustness of the plsRglm model. Diagnostic values for measuring autoantibodies against individual TAAs under the same condition (i.e., specificity over 95%) were listed in Additional files 9–11: Tables S5–S7.

Comparation of the plsRglm model and traditional markers to diagnose early-stage ESCC

The diagnostic performance of plsRglm model in early-stage ESCC was compared with Cyfra21 - 1, CEA, and the combined biomarker test (i.e., Cyfra21 - 1 + CEA). Forty-five normal controls and 45 early-stage ESCC patients were randomly selected from external validation set 2 between 2018 and 2019. The combined biomarker test was calculated by an equation constructed through a logistic regression model: ln(p/(1 − p)) = 0.068 × (CEA) + 0.271 × (Cyfra21 - 1) − 0.895. The AUC for plsRglm model was significantly greater than those for CEA, Cyfra21 - 1, or the two-biomarker panel (Additional file 13: Fig. S5).

Effect of clinical variables on autoantibody assay

Almost no significant differences were observe in the positivity of individual autoantibodies or the plsRglm model when the ESCC samples were subdivided by gender, age, smoking status, and other clinicopathological parameters (Additional files 14–18: Tables S8–S12).

Application of the plsRglm model

In order to make it more practical for clinical use, the plsRglm model was integrated into an online web app that can be accessed through a “shinyapps.io” server. The web interface enables users to enter the values of four autoantibodies for a single sample, and subsequently, the web app is capable of providing the probability of ESCC diagnosis. The online calculation is available for free at https://liucantong.shinyapps.io/Esophageal_cancer_prediction_tool/. When the predicted value exceeds 78.6%, it indicates a higher likelihood of the subject suffering from ESCC, and further endoscopic examination is recommended.

Discussion

In this study, we constructed a four-autoantibody signature using machine learning methods that could potentially be used to diagnose early-stage ESCC. One hundred two ML models were constructed and compared, and finally the plsRglm model with great performance of robustness was selected out. The diagnostic value of this model was verified in training and independent validation sets. This plsRglm model even could identify ESCC as early as 5 years before diagnosis in the nest cohort.

To reduce cancer mortality, early detection is recognized as one of the most hopeful approaches. The endoscopic-biopsy tool has been widely applied in clinical practice for the identification of esophageal cancers and preinvasive lesions. A recent prospective study firstly indicated that one-time endoscopic screening and intervention would lead to a significant reduction in mortality and incidence of esophageal cancers [28]. But the invasive examination experience, ability of local endoscopy and pathological professionals impede its extensive utilization for the screening of asymptomatic people. It is conceivable that a robust serum biomarker-based test, a primary screening method which could be used in the early or asymptomatic stage of esophageal cancer, would concentrate the population and further lessen unnecessary invasive operation. In recent decades, many blood biomarkers, such as DNA methylation, microRNAs, long non-coding RNAs, and autoantibodies, have the potential for the early detection of cancers [38,39,40,41,42,43,44]. Among the biomarkers, autoantibodies are attractive biomarker entities in the early detection of patients with cancer [7,8,9,10,11,12,13,14,15,16,17,18,19, 45,46,47].

In the past decade, numerous attempts have been made to identify robust autoantibody biomarkers in the early detection of ESCC, and the results suggested that the development of an optimized autoantibody panel might be used to detect early esophageal cancers [14,15,16,17,18,19]. However, almost all these reports had limitations, such as poor efficiency of early diagnosis, small study sample size, single-center study design, no independent validation, and a lack of preclinical data. Based on above, a high-quality research with large and independent patient cohorts on assessing autoantibody biomarkers for the diagnosis and detection of early or preclinical ESCC is necessary. In fact, this study aimed at the detection of early-stage ESCC. Recent, a research team built a novel ML model ESCCPred based on five autoantibodies which showed the similar great performances when compared with our results [27]. In comparison, our newly constructed plsRglm model applied larger sample size (1651 vs. 1309) and more training ML models (102 vs. 12). Application of the four-autoantibody signature provided high sensitivity and specificity in discriminating between early ESCC patients and controls. These results suggest that the detection of the four-autoantibody signature in patient serum has improved the diagnostic efficiency for ESCC or early ESCC, compared to our previous report [18]. The high sensitivity and specificity of this plsRglm model suggests that it could make a contribution to the screening and diagnosis of early ESCC patients. Furthermore, this plsRglm model identified in our study was validated by using four independent cohorts from different medical centers. In addition, our study also included a nested case–control validation that analyzed the prospective collection of preclinical archived blood samples from ESCC cases and matched controls. Our present study indicates the potential of this autoantibody signature as a non-invasive assessment of a subsequent diagnosis of ESCC within several years of blood collection.

Enhanced glycolysis is a metabolic characteristic of cancer, which was first suggested by Otto Warburg [48]. Indeed, later studies confirmed that protein levels and enzymatic activities of many glycolytic enzymes were upregulated in many cancers [49, 50]. A recent breast cancer study provided evidence of strong reactivity of autoantibodies against nine proteins involved in glycolysis (including ALDOA and ENO1) in prediagnostic plasma from women with breast cancer [51]. Here, we also demonstrated that our autoantibody signature involving ALDOA and ENO1 could differentiate individuals with preclinical ESCC from controls (Table 2).

Autoantibodies are regarded as immunological biomarkers of aberrant cellular mechanisms existing during carcinogenesis. However, the specific mechanisms underlying the autoantibody response remain to be elucidated. Increasing studies indicate that mutations, protein overexpression and aberrant post-translational modifications, changes in protein location or abundance can trigger tumor-associated antigens immunogenic, which may lead to the production of tumor-associated autoantibodies [52]. In our study, the panel involve autoantibodies against p53, NY-ESO- 1, ALDOA, and ENO1. p53 autoantibody was the most commonly studied autoantibody. The generation of p53 autoantibodies is hypothesized to be related p53 protein overexpression and missense mutations in the TP53 gene lead to a loss of function of the protein [53, 54]. NY-ESO- 1 autoantibody was another commonly studied autoantibody. A report suggested that NY-ESO- 1 antigen elicits humoral immune responses not only by mutant but also wild-type epitopes [55]. For ENO1 autoantibodies, post-translational modifications may contribute to their generation [56]. So far, there is no relevant study on the underlying mechanisms of the production of ALDOA autoantibodies in cancer. Future studies should be performed to explore the deeper mechanisms on how these tumor-associated antigens trigger humoral immune responses, which is crucial for translating these autoantibodies as diagnostic biomarkers in the real-world clinical applications.

Although the plsRglm model was eventually selected, the other 39 ML models also performed well (Additional file 5: Table S2). For final model selection, we used K-S value to evaluate the discrimination ability and PSI to the robustness. K-S value, functioned like AUC, is used to evaluate the ability of distinguishing one group from another one [33, 57, 58]. PSI is a metric used to measure the distributional changes between two sample sets. It is applied in monitoring the stability of models between different populations [34, 59]. Combined them, a greater K-S value and a less PSI value mean a high-discriminated and robust model. In the current study, we finally selected the plsRglm model. We noted that the AUC of plsRglm model (0.860) was similar with that of plr model (0.861), one kind of traditional logistic regressions, but the PSI of plsRglm model (0.034) was higher than that of plr model (0.054), which means the plsRglm model would be more stable.

We observed that none of the six autoantibodies (autoantibodies against p53, NY-ESO- 1, MMP- 7, Hsp70, Prx VI, and Bmi- 1) in our previous study was identified by SERPA in the discovery phase in the current study. The following circumstances may explain this phenomenon. First, only one cell line Eca109 was used in SERPA. Due to the heterogeneity of esophageal squamous cell carcinoma [60], it is not possible for all autoantibody biomarkers to be detected in one cell line. Second, not all proteins of reactive spots in SERPA were selected for identification by mass spectrometry. This also leads that potential autoantibodies cannot be screened comprehensively. Some of the six autoantibodies in our previous study might exist in reactive spots that were not selected to identify.

When this four-autoantibody test is implemented in population-based mass screening, the quality control of blood sample collection and the laboratory process should be performed to ensure the reliability and accuracy of experimental results. In detail, vacuum blood collection tubes without anticoagulant should be used to collect people’s peripheral blood, then the peripheral blood samples are allowed to clot at room temperature for 30 min and centrifuged at 2500 g for 10 min. The serum is removed from the tube for the detection of the four autoantibodies. If it is not possible to test the same day, the specimen should be stored in 1.7 ml SafeSeal Microcentrifuge Tubes at 4 or − 80 °C until use. Moreover, all serum samples and quality control sample (a pooled serum sample collected randomly from 100 ESCC patients) should be run in duplicate. Quality control samples are used to ensure quality control monitoring of the assay runs by using Levey-Jennings plots. Finally, all the above mentioned need to be written as standard operating procedure (SOP) and all operations should be carried out in accordance with the requirements of the SOP.

There are a few limitations in our study. First, the lack of mechanistic insights of the production of autoantibodies against ALDOA, ENO1, p53, and NY-ESO- 1 in ESCC patients is one of the limitations of the current study. Moreover, as participants recruited in each set differed in sample size and clinical parameters (Fig. 1, Table 1), the diagnostic performances of the plsRglm model in different groups were different to some degree (Fig. 3, Table 2). Despite these differences, the plsRglm model showed generally similar diagnostic value in training set and validation sets. Whether this plsRglm model has diagnostic value for ESCC in populations from different regions of the world, or for esophageal adenocarcinoma, needs to be further evaluated. Additionally, in the discovery phase, the samples involved in SERPA for the screening potential autoantibodies were relatively small and only one cell line Eca109 was used. This leads that autoantibody biomarkers cannot be screened more comprehensively and the result may lack good reproducibility. To identify more novel autoantibodies as potential biomarkers for ESCC diagnosis, future study should include more samples and multiple cell lines combined with clinical samples using the method of SERPA. Finally, we could just include 24 preclinical ESCC samples in the nested case–control study, of which the result demonstrated that the sensitivity and specificity of the four-autoantibody signature decreased in a certain proportion. We can speculate that most persons in a prospective study would be individuals with precancerous lesions or asymptomatic disease, and the sensitivity of this assay is likely to be less than reported here. Since the prospectively collected samples in the current study are relatively small, further prospective study with large samples should be performed to verify the results.

Conclusions

In conclusion, to date, this study possesses the largest sample size of diagnosed ESCC patients and a precious source of blood specimens of prediagnostic ESCC to report the clinical performance of a four-autoantibody signature based on machine learning methods for ESCC. Our data indicate that this signature has considerable diagnostic value for early ESCC, and the potential for the surveillance of preclinical ESCC and post-operation follow-up.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

EC:

Esophageal cancer

ESCC:

Esophageal squamous cell carcinoma

TAAs:

Tumor-associated antigens

ML:

Machine learning

SERPA:

Serological proteome analysis

SUMC:

Shantou University Medical College

SYSU:

Sun Yat-sen University

AJCC:

American Joint Committee on Cancer

STR:

Short tandem repeat

PVDF:

Polyvinylidene fluoride

HRP:

Horseradish peroxidase

ELISA:

Enzyme-linked immunosorbent assay

QCS:

Quality control samples

CV:

Coefficients of variation

ROC:

Receiver operating characteristic

AUC:

Area under the ROC curve

CI:

Confidence interval

OD:

Optical density

K-S:

Kolmogorov-Smirnov

PSI:

Population Stability Index

DCA:

Decision curve analysis

PLR:

Positive likelihood ratio

NLR:

Negative likelihood ratio

SOP:

Standard operating procedure

References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229–63.

    Article  PubMed  Google Scholar 

  2. Hongo M, Nagasaki Y, Shoji T. Epidemiology of esophageal cancer: Orient to Occident. Effects of chronology, geography and ethnicity. J Gastroenterol Hepatol. 2009;24:729–35.

    Article  PubMed  Google Scholar 

  3. Yang L, Parkin DM, Ferlay J, Li L, Chen Y. Estimates of cancer incidence in China for 2000 and projections for 2005. Cancer Epidemiol Biomarkers Prev. 2005;14:243–50.

    Article  PubMed  Google Scholar 

  4. He S, Xia C, Li H, et al. Cancer profiles in China and comparisons with the USA: a comprehensive analysis in the incidence, mortality, survival, staging, and attribution to risk factors. Sci China Life Sci. 2024;67:122–31.

    Article  PubMed  Google Scholar 

  5. Peng ZY, Wang QS, Li K, et al. Stem signatures associating SOX2 antibody helps to define diagnosis and prognosis prediction with esophageal cancer. Ann Med. 2022;54:921–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wu J, Wang P, Han Z, et al. A novel immunodiagnosis panel for hepatocellular carcinoma based on bioinformatics and the autoantibody-antigen system. Cancer Sci. 2022;113:411–22.

    Article  CAS  PubMed  Google Scholar 

  7. Wang X, Yu J, Sreekumar A, et al. Autoantibody signatures in prostate cancer. N Engl J Med. 2005;353:1224–35.

    Article  CAS  PubMed  Google Scholar 

  8. Qiu J, Choi G, Li L, et al. Occurrence of autoantibodies to annexin I, 14-3-3 theta and LAMR1 in prediagnostic lung cancer sera. J Clin Oncol. 2008;26:5060–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Peng YH, Xu YW, Huang LS, et al. Autoantibody signatures combined with Epstein-Barr virus capsid antigen-IgA as a biomarker panel for the detection of nasopharyngeal carcinoma. Cancer Prev Res (Phila). 2015;8:729–36.

    Article  CAS  PubMed  Google Scholar 

  10. Pedersen JW, Gentry-Maharaj A, Nostdal A, et al. Cancer-associated autoantibodies to MUC1 and MUC4–a blinded case-control study of colorectal cancer in UK collaborative trial of ovarian cancer screening. Int J Cancer. 2014;134:2180–8.

    Article  CAS  PubMed  Google Scholar 

  11. Evans RL, Pottala JV, Egland KA. Classifying patients for breast cancer by detection of autoantibodies against a panel of conformation-carrying antigens. Cancer Prev Res (Phila). 2014;7:545–55.

    Article  CAS  PubMed  Google Scholar 

  12. Anderson KS, Cramer DW, Sibani S, et al. Autoantibody signature for the serologic detection of ovarian cancer. J Proteome Res. 2015;14:578–86.

    Article  CAS  PubMed  Google Scholar 

  13. Middleton CH, Irving W, Robertson JF, et al. Serum autoantibody measurement for the detection of hepatocellular carcinoma. PLoS ONE. 2014;9: e103867.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Cheng Y, Xu J, Guo J, et al. Circulating autoantibody to ABCC3 may be a potential biomarker for esophageal squamous cell carcinoma. Clin Transl Oncol. 2013;15:398–402.

    Article  CAS  PubMed  Google Scholar 

  15. Liu WL, Zhang G, Wang JY, et al. Proteomics-based identification of autoantibody against CDC25B as a novel serum marker in esophageal squamous cell carcinoma. Biochem Biophys Res Commun. 2008;375:440–5.

    Article  CAS  PubMed  Google Scholar 

  16. Zhang J, Wang K, Zhang J, Liu SS, Dai L, Zhang JY. Using proteomic approach to identify tumor-associated proteins as biomarkers in human esophageal squamous cell carcinoma. J Proteome Res. 2011;10:2863–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zhou SL, Yue WB, Fan ZM, et al. Autoantibody detection to tumor-associated antigens of P53, IMP1, P16, cyclin B1, P62, C-myc, Survivn, and Koc for the screening of high-risk subjects and early detection of esophageal squamous cell carcinoma. Dis Esophagus. 2014;27:790–7.

    Article  CAS  PubMed  Google Scholar 

  18. Xu YW, Peng YH, Chen B, et al. Autoantibodies as potential biomarkers for the early detection of esophageal squamous cell carcinoma. Am J Gastroenterol. 2014;109:36–45.

    Article  CAS  PubMed  Google Scholar 

  19. Gao H, Zheng Z, Mao Y, et al. Identification of tumor antigens that elicit a humoral immune response in the sera of Chinese esophageal squamous cell carcinoma patients by modified serological proteome analysis. Cancer Lett. 2014;344:54–61.

    Article  CAS  PubMed  Google Scholar 

  20. Wang M, Liu F, Pan Y, et al. Tumor-associated autoantibodies in ESCC screening: detecting prevalent early-stage malignancy or predicting future cancer risk? EBioMedicine. 2021;73: 103674.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Chen ZH, Lin L, Wu CF, Li CF, Xu RH, Sun Y. Artificial intelligence for assisting cancer diagnosis and treatment in the era of precision medicine. Cancer Commun (Lond). 2021;41:1100–15.

    Article  PubMed  Google Scholar 

  22. Liu W, Ma J, Zhang J, et al. Identification and validation of serum metabolite biomarkers for endometrial cancer diagnosis. EMBO Mol Med. 2024;16:988–1003.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Huang Z, Deng C, Ma C, et al. Identification and validation of the surface proteins FIBG, PDGF-beta, and TGF-beta on serum extracellular vesicles for non-invasive detection of colorectal cancer: experimental study. Int J Surg. 2024;110:4672–87.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Pan X, Zhang Z, Yun Y, et al. Machine learning-assisted high-throughput identification and quantification of protein biomarkers with printed heterochains. J Am Chem Soc. 2024;146:19239–48.

    Article  CAS  PubMed  Google Scholar 

  25. Visaggi P, Barberio B, Ghisa M, et al. Modern diagnosis of early esophageal cancer: from blood biomarkers to advanced endoscopy and artificial intelligence. Cancers (Basel). 2021;13:3162.

    Article  CAS  PubMed  Google Scholar 

  26. Gao Y, Xin L, Lin H, et al. Machine learning-based automated sponge cytology for screening of oesophageal squamous cell carcinoma and adenocarcinoma of the oesophagogastric junction: a nationwide, multicohort, prospective study. Lancet Gastroenterol Hepatol. 2023;8:432–45.

    Article  CAS  PubMed  Google Scholar 

  27. Li T, Sun G, Ye H, et al. ESCCPred: a machine learning model for diagnostic prediction of early esophageal squamous cell carcinoma using autoantibody profiles. Br J Cancer. 2024;131:883–94.

    Article  CAS  PubMed  Google Scholar 

  28. Wei WQ, Chen ZF, He YT, et al. Long-term follow-up of a community assignment, one-time endoscopic screening study of esophageal cancer in China. J Clin Oncol. 2015;33:1951–7.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Rice TW, Gress DM, Patil DT, Hofstetter WL, Kelsen DP, Blackstone EH. Cancer of the esophagus and esophagogastric junction-major changes in the American Joint Committee on Cancer eighth edition cancer staging manual. CA Cancer J Clin. 2017;67:304–17.

    Article  PubMed  Google Scholar 

  30. Fujita Y, Nakanishi T, Hiramatsu M, et al. Proteomics-based approach identifying autoantibody against peroxiredoxin VI as a novel serum marker in esophageal squamous cell carcinoma. Clin Cancer Res. 2006;12:6415–20.

    Article  CAS  PubMed  Google Scholar 

  31. Shimada Y, Imamura M, Wagata T, Yamaguchi N, Tobe T. Characterization of 21 newly established esophageal cancer cell lines. Cancer. 1992;69:277–84.

    Article  CAS  PubMed  Google Scholar 

  32. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5:1315–6.

    Article  PubMed  Google Scholar 

  33. Zheng B, Liang T, Mei J, et al. Prediction of 90 day readmission in heart failure with preserved ejection fraction by interpretable machine learning. ESC Heart Fail. 2024;11:4267–76.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Taplin R, Hunt C. The population accuracy index: a new measure of population stability for model monitoring. Risks. 2019;7:53.

    Article  Google Scholar 

  35. Boyle P, Chapman CJ, Holdenrieder S, et al. Clinical validation of an autoantibody test for lung cancer. Ann Oncol. 2011;22:383–9.

    Article  CAS  PubMed  Google Scholar 

  36. Arkin CF, Wachtel MS. How many patients are necessary to assess test performance? JAMA. 1990;263:275–8.

    Article  CAS  PubMed  Google Scholar 

  37. Grimes DA, Schulz KF. Compared to what? Finding controls for case-control studies Lancet. 2005;365:1429–33.

    PubMed  Google Scholar 

  38. Wagner PD, Verma M, Srivastava S. Challenges for biomarkers in cancer detection. Ann N Y Acad Sci. 2004;1022:9–16.

    Article  CAS  PubMed  Google Scholar 

  39. Mabert K, Cojoc M, Peitzsch C, Kurth I, Souchelnytskyi S, Dubrovska A. Cancer biomarker discovery: current status and future perspectives. Int J Radiat Biol. 2014;90:659–77.

    Article  PubMed  Google Scholar 

  40. Wittenberger T, Sleigh S, Reisel D, et al. DNA methylation markers for early detection of women’s cancer: promise and challenges. Epigenomics. 2014;6:311–27.

    Article  CAS  PubMed  Google Scholar 

  41. Li SQ, Chen FJ, Cao XF. Distinctive microRNAs in esophageal tumor: early diagnosis, prognosis judgment, and tumor treatment. Dis Esophagus. 2013;26:288–98.

    Article  PubMed  Google Scholar 

  42. Tong YS, Wang XW, Zhou XL, et al. Identification of the long non-coding RNA POU3F3 in plasma as a novel biomarker for diagnosis of esophageal squamous cell carcinoma. Mol Cancer. 2015;14:3.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Werner S, Chen H, Tao S, Brenner H. Systematic review: serum autoantibodies in the early detection of gastric cancer. Int J Cancer. 2015;136:2243–52.

    Article  CAS  PubMed  Google Scholar 

  44. Zhang H, Xia J, Wang K, Zhang J. Serum autoantibodies in the early detection of esophageal cancer: a systematic review. Tumour Biol. 2015;36:95–109.

    Article  PubMed  Google Scholar 

  45. Lam S, Boyle P, Healey GF, et al. EarlyCDT-Lung: an immunobiomarker test as an aid to early detection of lung cancer. Cancer Prev Res (Phila). 2011;4:1126–34.

    Article  PubMed  Google Scholar 

  46. Chapman CJ, Healey GF, Murray A, et al. EarlyCDT(R)-Lung test: improved clinical utility through additional autoantibody assays. Tumour Biol. 2012;33:1319–26.

    Article  CAS  PubMed  Google Scholar 

  47. Jett JR, Peek LJ, Fredericks L, Jewell W, Pingleton WW, Robertson JF. Audit of the autoantibody test, EarlyCDT(R)-lung, in 1600 patients: an evaluation of its performance in routine clinical practice. Lung Cancer. 2014;83:51–5.

    Article  PubMed  Google Scholar 

  48. Warburg O. On respiratory impairment in cancer cells. Science. 1956;124:269–70.

    Article  CAS  PubMed  Google Scholar 

  49. Durany N, Joseph J, Campo E, Molina R, Carreras J. Phosphoglycerate mutase, 2,3-bisphosphoglycerate phosphatase and enolase activity and isoenzymes in lung, colon and liver carcinomas. Br J Cancer. 1997;75:969–77.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Altenberg B, Greulich KO. Genes of glycolysis are ubiquitously overexpressed in 24 cancer classes. Genomics. 2004;84:1014–20.

    Article  CAS  PubMed  Google Scholar 

  51. Ladd JJ, Chao T, Johnson MM, et al. Autoantibody signatures involving glycolysis and splicesome proteins precede a diagnosis of breast cancer among postmenopausal women. Cancer Res. 2013;73:1502–13.

    Article  CAS  PubMed  Google Scholar 

  52. Sexauer D, Gray E, Zaenker P. Tumour-associated autoantibodies as prognostic cancer biomarkers - a review. Autoimmun Rev. 2022;21: 103041.

    Article  CAS  PubMed  Google Scholar 

  53. Brosh R, Rotter V. When mutants gain new powers: news from the mutant p53 field. Nat Rev Cancer. 2009;9:701–13.

    Article  CAS  PubMed  Google Scholar 

  54. Butt J, Blot WJ, Visvanathan K, et al. Auto-antibodies to p53 and the subsequent development of colorectal cancer in a U.S. prospective cohort consortium. Cancer Epidemiol Biomarkers Prev. 2020;29:2729–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Chen WS, Haynes WA, Waitz R, et al. Autoantibody landscape in patients with advanced prostate cancer. Clin Cancer Res. 2020;26:6204–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Almaguel FA, Sanchez TW, Ortiz-Hernandez GL, et al. Emerging tumor-associated antigen, cancer biomarker, and oncotherapeutic target. Front Genet. 2021;11: 614726.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Gong C, Cai T, Wang Y, et al. Development and validation of a nocturnal hypoglycaemia risk model for patients with type 2 diabetes mellitus. Nurs Open. 2024;11: e70055.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Wang L, Wang Y, Xuan C, et al. Predicting potential microbe-disease associations based on multi-source features and deep learning. Brief Bioinform. 2023;24:bbad255.

    Article  PubMed  Google Scholar 

  59. Maddalo M, Fanizzi A, Lambri N, et al. Robust machine learning challenge: an AIFM multicentric competition to spread knowledge, identify common pitfalls and recommend best practice. Phys Med. 2024;127: 104834.

    Article  PubMed  Google Scholar 

  60. Jiang G, Wang Z, Cheng Z, et al. The integrated molecular and histological analysis defines subtypes of esophageal squamous cell carcinoma. Nat Commun. 2024;15:8988.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Dr. Frieda Law for manuscript revision.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 82472974); Noncommunicable Chronic Diseases-National Science and Technology Major Project (grant number 2023ZD0501400); Population-based Prospective Cohort Study of Esophageal Cancer and Precancerous Lesions in High-risk Areas (grant number 2016YFC0901404); Guangdong Basic and Applied Basic Research Foundation (grant numbers 2024B1515230005, 2025A1515012575, 2022A1515220116, 2022A1515220180, and 2022A1515220182); Science and Technology Special Fund of Guangdong Province of China (grant numbers STKJ202209069, STKJ2023002); Youth Research Fund Project of Cancer Hospital of Shantou University Medical College (grant number 2023A005); Guangdong Esophageal Cancer Institute Science and Technology Program (grant number M202224); 2024 Central Guidance Local Science and Technology Development Special Fund Project for Shantou Innovative City Construction (grant number STKJ2024078).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: YWX, FCW, WQW, EML, and LYX; data curation: YHP, and HC; formal analysis: YWX, and CTL; funding acquisition: YWX, CTL, FCW, and WQW; investigation: CTL, LYC, and HLC; methodology: CQH, and CTL; project administration: FCW, and WQW; resources: CQH, HLC, HPG, ZYW, and WQW; supervision: WQW, FCW, EML, and LYX; validation: CTL, LYC, and HLC; visualization: YWX, and CTL; writing-original draft: YWX, YHP, and CH; writing-review & editing: YWX, WQW, FCW, EML, and LYX. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yi-Wei Xu, Wen-Qiang Wei, Li-Yan Xu, Fang-Cai Wu or En-Min Li.

Ethics declarations

Ethics approval and consent to participate

Approval for the study from the institutional ethics review committee at the Cancer Hospital of Shantou University Medical College (approval number: 2015042419), Cancer Institute and Hospital of Chinese Academy of Medical Sciences (approval number: 16–171/1250), and the Cancer Center of Sun Yat-sen University (approval number: B2024 - 181–01), and informed consent of all participants were obtained. This work was complied with principles of the Helsinki Declaration.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12916_2025_4066_MOESM1_ESM.docx

Additional file 1: Additional methods. Additional file 2: Fig. S1 Autoantibodies to ALDOA, ENO1, TPI1, and GAPDH in serum from a patient with esophageal squamous cell carcinoma (ESCC). (a) Coomassie-stained two-dimensional gel for total protein isolated from the Eca109 ESCC cell line. The protein spots indicated by the black arrow were identified as ALDOA (spot no.1 and no.2), ENO1 (spot no.3, no.4, and no.5), GAPDH (spot no.6), and TPI1 (spot no.7) by mass spectrometry. (b) Eca109 cell lysate proteins were separated by two-dimensional PAGE, transferred to PVDF membrane, and then incubated with diluted sera (1:250) from a patient with ESCC. (c) PVDF membrane incubated with sera from a normal control. PVDF membranes were incubated with appropriate secondary antibodies and visualized by chemiluminescence. Additional file 3: Table S1 List of tumor proteins detected by proteomic identification. Additional file 4: Fig. S2 Diagnostic outcomes for autoantibodies to MMP-7, Prx VI, Hsp70, Bmi-1, TPI1, and GAPDH in esophageal squamous cell carcinoma (ESCC). (a) Floating bars of optical density (OD) values of autoantibodies to TPI1, GAPDH, MMP-7, Prx VI, Hsp70, and Bmi-1 from ESCC sera (n = 388) and normal sera (n = 125) in the training set. The Mann–Whitney U test was used to compare levels of individual autoantibodies in serum between ESCC patients and normal controls. (b) Receiver operating characteristic (ROC) curve analysis of autoantibodies to MMP-7, Prx VI, Hsp70, Bmi-1, TPI1, and GAPDH for all patients with ESCC (n = 388) patients and patients with early-stage ESCC (n = 53) in the training set. Additional file 5: Table S2 Tuning parameters and AUCs of 102 machine learning models in five datasets. Additional file 6: Table S3 Evaluation of machine learning models using K-S test and PSI. Additional file 7: Table S4 Results for measurement of the five ML model with top five lowest PSI in the diagnosis of ESCC and preclinical ESCC. Additional file 8: Fig. S3 Receiver operating characteristic (ROC) curve analysis of individual autoantibodies for the diagnosis of esophageal squamous cell carcinoma (ESCC) and early-stage ESCC. Additional file 9: Table S5 Results for measurement of autoantibodies to ALDOA, ENO1, p53, and NY-ESO-1 in the diagnosis of ESCC. Additional file 10: Table S6 Measurement of autoantibodies to TPI1, GAPDH, MMP-7, Hsp70, Bmi-1, and Prx VI in the diagnosis of ESCC in the training set. Additional file 11: Table S7 Diagnostic performance by stage of ESCC for autoantibodies to ALDOA, ENO1, p53, NY-ESO-1, and the four-autoantibody signature. Additional file 12: Fig. S4 Clinical decision curves and calibration curves of plsRglm model in training set (A, E), internal validation set (B, F), external validation set 1 (C, G), and external validation set 2 (D, H). Additional file 13: Fig. S5 Performance of different biomarkers in discriminating early-stage esophageal squamous cell carcinoma (ESCC) patients from normal controls. Additional file 14: Table S8 Relationship between positive rate of ALDOA autoantibodies and clinicopathologic features in ESCC patients. Additional file 15: Table S9 Relationship between positive rate of ENO1 autoantibodies and clinicopathologic features in ESCC patients. Additional file 16: Table S10 Relationship between positive rate of p53 autoantibodies and clinicopathologic features in ESCC patients. Additional file 17: Table S11 Relationship between positive rate of NY-ESO-1 autoantibodies and clinicopathologic features in ESCC patients. Additional file 18: Table S12 Relationship between positive rate of the plsRglm model and clinicopathologic features in ESCC patients.

Supplementary Material 2.

Supplementary Material 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, YW., Peng, YH., Liu, CT. et al. Machine learning technique-based four-autoantibody test for early detection of esophageal squamous cell carcinoma: a multicenter, retrospective study with a nested case–control study. BMC Med 23, 235 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04066-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-025-04066-2

Keywords