Prognostic nomograms for predicting cause-specific survival and overall survival of stage I–III colon cancer patients: a large population-based study

Background The purpose of this study was to build functional nomograms based on significant clinicopathological features to predict cause-specific survival (CSS) and overall survival (OS) in patients with stage I–III colon cancer. Methods Data on patients diagnosed with stage I–III colon cancer between 2010 and 2015 were downloaded from the Surveillance, Epidemiology, and End Results (SEER) database. Univariate and multivariate Cox analyses were used to identify independent prognostic factors, which were used to construct nomograms to predict the probabilities of CSS and OS. The performance of the nomogram was assessed by C-indexes, receiver operating characteristic (ROC) curves and calibration curves. Decision curve analysis (DCA) was used to compare clinical usage between the nomogram and the tumor–node–metastasis (TNM) staging system. Results Based on the univariate and multivariate analyses, features that correlated with survival outcomes were used to establish nomograms for CSS and OS prediction. The nomograms showed favorable sensitivity at predicting 1-, 3-, and 5-year CSS and OS, with a C-index of 0.78 (95% confidence interval (CI) 0.77–0.80) for CSS and 0.74 (95% CI 0.73–0.75) for OS. Calibration curves and ROC curves revealed excellent predictive accuracy. The clinically and statistically significant prognostic performance of the nomogram generated with the entire group of patients and risk scores was validated by a stratified analysis. DCA showed that the nomograms were more clinically useful than TNM stage. Conclusion Novel nomograms based on significant clinicopathological characteristics were developed and can be used as a tool for clinicians to predict CSS and OS in stage I–III colon cancer patients. These models could help facilitate a personalized postoperative evaluation.

which made colon cancer a serious problem for public health.
The prognosis of colon cancer is associated with the American Joint Commission on Cancer/International Union against Cancer (AJCC/UICC) tumor-nodemetastasis (TNM) staging system. According to stages defined by the TNM system, the 5-year stage-specific survival rates are 93.2% for stage I, 82.5% for stage II, and 59.5% for stage III [3]. Nonetheless, patients with stage I-III colon cancer usually have an obviously divergent prognosis because of discrepant genetic and epigenetic backgrounds, even though some colon cancer patients are in the same AJCC stage. Compared with patients with stage IIIa colon cancer, whose 5-year survival rate is 83.4%, patients with stage IIb colon cancer, whose 5-year survival rate is 72.2%, experience severe prognostic events similar to patients with stage IIIb colon cancer, whose 5-year survival rate is 64.1% [4]. Although the TNM staging system is most widely used for prognosis assessment and medical treatments in colon cancer patients, excessive hidden defects still limit its practical application.
Some studies have described that clinicopathological features such as tumor size, the carcinoembryonic antigen (CEA) level, adjuvant chemotherapy, and the log odds of metastatic lymph nodes (LODDS) may also influence colon cancer patients' survival outcomes [5,6]. In addition to clinicopathological features, various nonbiological factors were proposed to be included in the patient's clinical assessment for malignant tumor therapy. For instance, marital status is a significant factor in clinical resolution. Socioeconomic status and insurance status are also important when selecting a treatment strategy. Multiple factors are needed to account for the wide range of variability observed in individual patients. Ignoring these significant prognostic parameters may reduce the accuracy of survival predictions. Thus, a comprehensive prognostic judgment system including clinicopathological and demographic factors is required in clinical practice.
In fact, various prognostic analysis methods have been applied to clinical applications. For instance, microsatellite instability (MSI) or mismatch repair deficiency (dMMR) status is considered the most important biomarker in colon cancer patients [7,8]. Chromosomal instability (CIN) and CpG island methylator phenotype (CIMP) are also widely accepted as biomarkers for metastasis risk and prognostic analysis [9]. In addition, certain genes and molecules, such as the KRAS gene, the APC gene, the p53 gene, CD44, CD133, and MEK, have been found to be indicators for judging the prognosis of colon cancer patients [10][11][12]. However, these methods of detection and judgment not only sometimes result in trauma to the patient but also have considerable economic costs. As a convenient and saving graphical interface of a statistical prediction model, nomograms in which various significant variables are combined to predict a specific endpoint have been built by scientists to meet this demand. By integrating these clinical and pathological features, a nomogram simplifies the complicated computational model into a single numerical estimation probability, such as death or disease recurrence, which is tailored to the individual condition. Therefore, a nomogram might be used as a dependable instrument for predicting patients' survival outcomes and supporting decisions with regard to surgery, surveillance, and adjuvant treatments. Recently, some researchers have reported that the nomogram scoring system has an exceptional capability in predicting prognosis [13,14]. However, most nomograms used to predict the prognosis of patients with colon cancer, of which the sample size used for development was limited, required a combination of molecular biology tests, which increased the economic burden, time and cost for the patient. This research aimed to develop nomograms that require only clinical features combined with the patient's socioeconomic status, which is easy to obtain.
The Surveillance, Epidemiology, and End Results (SEER) program provides a profusion of integral information for different cancers from 20 cancer registries that cover ~ 28% of the population. Based on the SEER database, researchers have conducted several studies on the prognosis of cancer [15]. In the present research, information on stage I-III colon cancer was collected from the SEER database to build a nomogram that was intuitive and convenient for predicting the prognosis of colon cancer patients.

Patients selection
In this study, a total of 167,333 patients with colon cancer were acquired from the SEER database. The detailed workflow for patient selection is shown in Fig. 1. All colon cancer patients treated with radical surgery between January 1, 2010 and December 31, 2015, were assessed for inclusion in the retrospective analysis. Patients were excluded if non-colon cancer was stated in the pathology report, if they were diagnosed with TNM stage IV or an unknown stage cancer and if they suffered from 2 or more malignant tumors. Eighteen variables were extracted from the SEER program in this study, including race, carcinoembryonic antigen (CEA) level, age, year of diagnosis, sex, adjuvant chemotherapy, histological type, grade, tumor size, number of lymph nodes harvested (LNH), regional nodes positive, LODDS stage, marital status, tumor site, tumor deposit, T stage, N stage, and TNM stage. Patients whose races were recorded as Native American, Asian, Pacific Islander and unknown in the SEER database were assigned to the "other" race category for analysis. Patients without any of these 18 variables were excluded. Patient survival was measured as cause-specific survival (CSS) and OS [16]. Finally, data on 34,432 patients diagnosed with stage I-III colon cancer between 2010 and 2015 were obtained from the SEER database.

Construction and validation of the nomogram
Univariable and multivariable Cox regression analyses were used to calculate the effect of variables on CSS and OS. The measure of the effect of each variable on CSS and OS is presented as the hazard ratio (HR) and was used to identify independent risk factors. Based on the multivariable Cox regression analyses, two applied nomograms incorporating clinicopathological parameters into the TNM staging system were formulated. The total points in each case of the two survival groups were calculated using the established nomograms, after which Cox regression analysis of the whole cohort was performed using the total points as a parameter. Patients were divided into low-and high-risk groups based on the nomogram risk score and using the median risk score as the cut-off point.

The concordance index (C-index), receiver operating characteristic (ROC) and decision curve analysis (DCA)
The distinguishing ability of the nomogram was evaluated by the C-index and ROC curve analysis. The C-index was defined as the ratio of all patient pairs predicted to be consistent with the results. The 1-, 3-, and 5-year ROC curves were used to appraise the nomogram's predictive ability over time. DCA was recently proposed as a fresh method of evaluating predictive models and can be used to visualize the clinical consequences of a treatment method [17]; thus, DCA was carried out to compare the latent profit of the prognostic nomogram in this study.

Risk stratification based on the novel nomogram
To verify the independent discriminatory ability of the nomogram, this research regrouped all patients into high-, moderate-, and low-risk groups according to the total risk scores. Survival curves for different risk groups were generated using the Kaplan-Meier method and were differentiated using the log-rank test.

Statistical analyses
R software (version 3.6.0, http://www.r-proje ct.org) was used for all statistical analyses. The R statistical packages "rms", "survival", "Hmisc", "MASS", and "survivalROC" were used to calculate the C-index, plot the calibration and ROC curves, build the nomogram, and draw Kaplan-Meier curves, while the package "rmda" was used to draw the DCA curves. All statistical tests were 2-sided, and p values < 0.05 were considered statistically significant.

Patients' clinical characteristics and survival outcomes
Data on a total of 34,432 patients with stage I-III colon cancer were retrospectively collected from the SEER database. The patients' clinicopathological characteristics and 1-, 3-, and 5-year CSS and OS rates are listed in

Independent prognostic factors in stage I-III colon cancer patients
According to the results based on the univariate Cox regression analysis, 13 variables, namely, sex, age at diagnosis, primary tumor site, histological type, pathological grade, adjuvant chemotherapy, LNH, LODDS stage, tumor size, CEA level, marital status, T stage, and N stage, were associated with CSS and OS (Tables 2, 3). In the multivariate Cox regression analysis, twelve parameters, namely, age at diagnosis, primary tumor site, histological type, pathological grade, adjuvant chemotherapy, LNH, LODDS stage, tumor size, CEA level, marital status, T stage, and N stage, were defined as independent prognostic factors predicting the CSS of stage I-III colon cancer patients ( Table 2). All thirteen comparable variables (i.e., sex, age at diagnosis, primary tumor site, histological type, pathological grade, adjuvant chemotherapy, LNH, LODDS stage, tumor size, CEA level, marital status, T stage, and N stage) were defined as independent prognostic factors predicting the OS of stage I-III colon cancer patients (Table 3).

Construction and validation of the prognostic prediction nomogram
Considering the results of the multivariable Cox regression analysis for CSS and OS, all of the significant variables were used to create the nomogram for CSS and OS. The prognostic nomogram for 1-, 3-, and 5-year CSS is shown in Fig. 2. The prognostic nomogram for 1-, 3-, and 5-year OS is shown in Fig. 3. By summing the scores associated with each variable and projecting total scores to the bottom scale, the probabilities can be estimated for 1-, 3-, and 5-year CSS and OS. C-index values and ROC curves are ordinarily used to evaluate the discriminatory power of a nomogram. The C-indexes for the prediction of CSS and OS were 0.78 (95% CI 0.77-0.80) and 0.74 (95% confidence interval (CI) 0.73-0.75), respectively. To confirm that the nomogram had higher efficacy in predicting the prognosis of stage I-III colon cancer patients than TNM stage, time-dependent ROC analyses at 1, 3, and 5 years were conducted. The 1-, 3-, and 5-year AUC values of the nomogram for the prediction of CSS were 0.81, 0.807, and 0.787, respectively, compared with 0.646, 0.680, and 0.683, respectively, for the AUC values of TNM stage (Fig. 4a-c). In addition, the 1-, 3-, and 5-year AUC values of the nomogram for the prediction of OS were 0.782, 0.76, and 0.741, respectively, compared with 0.592, 0.613, and 0.606, respectively, for the AUC values of TNM stage (Fig. 4d-f ). In addition, calibration curves for the nomogram showed no deviations from the reference line, which indicating a high degree of credibility (Fig. 5a-f ).
The clinically and statistically significant prognostic performance of the nomogram based on the entire group of patients and risk scores was validated by a stratified analysis, which suggested that the nomogram could be used to clinically and statistically predict the prognosis of patients with stage II (Fig. 6a, b), and stage II-III colon cancer with or without adjuvant chemotherapy ( Fig. 6e-h).

Clinical value of the nomogram
DCA is a novel method used to evaluate alternative prognostic strategies and has advantages over the AUC. DCA curves for the novel nomogram and TNM stage are presented in Fig. 7. Compared with the TNM staging system, the DCA of the nomogram has higher net benefits, indicating that it has better clinical application value than TNM stage.

Prognostic nomogram for risk stratification
By regrouping all patients in the CSS and OS cohorts into three subgroups based on the total scores, the cut-off values were defined, and each group represents a distinct prognosis. The Kaplan-Meier survival curves were subsequently delineated and are shown in Fig. 8. In the CSS cohort, Group 1 (low-risk group) had the highest 5-year CSS rate of 95.0%, followed by Group 2 (moderate-risk group; 88.6%) and Group 3 (high-risk group 64.0%). In the OS cohort, Group 1 (low-risk group) had the highest 5-year OS rate of 89.1%, followed by Group 2 (moderaterisk group 76.8%) and Group 3 (high-risk group 51.5%). A significant statistical distinction in survival outcomes was observed between the three groups.

Discussion
Through this study, a nomogram merging clinicopathological parameters with the TNM staging system was built to assess the definite 1-, 3-, and 5-year CSS and OS probabilities of stage I-III colon cancer patients. The behavior of the nomogram (i.e., discrimination and calibration) was verified. From the perspective of clinical influence, the nomogram had a wide range of   HR hazard ratio, CI confidence interval, AD adenocarcinoma, MAD mucinous adenocarcinoma, SRCC signet ring cell carcinoma, LNH lymph nodes harvested, LODDS log of odds between the number of positive lymph node and the number of negative lymph node, CEA carcinoembryonic antigen, TNM tumor-node-metastasis a Includes Native American, Asian, Pacific Islander and Unknown Fig. 2 Nomogram convey the results of prognostic models using twelve clinicopathological characteristics to predict cause-specific survival of patients with stage I-III colon cancer threshold probabilities. From the perspective of ROC curve analysis and DCA, the nomogram showed better predictive accuracy and prognostic value in stage I-III colon cancer compared to the current TNM staging system. Moreover, the nomogram was competent to divide patients with stage I-III colon cancer into low-, moderate-, and high-risk groups, which indicates that the nomogram can be utilized as a conventional equipment in predicting the prognosis of stage I-III colon cancer.
In the present study, it was found that the number of young individuals diagnosed with colon cancer has increased. Previous research has revealed that age is an independent prognostic factor of stage I-III colon cancer patients, with a younger age indicating more pronounced outcomes [6]. In addition, a considerable prognostic factor certified by this study was CEA, which is a wellestablished biomarker for colon cancer recommended by both the American Society of Clinical Oncology (ASCO) and the European Group on Tumor Markers (EGTM) [18][19][20]. Preoperative CEA levels were used to predict prognosis, and routine CEA monitoring during the postoperative follow-up was used to monitor local relapse and distant metastases after colon cancer surgery. As this nomogram showed, stage I-III colon cancer patients with high CEA levels tended to have significantly poor CSS and OS rates. In addition, left-sided colon cancers (LCCs) and right-sided colon cancers (RCCs) are thought to have different embryological origins [21]. Various differences, such as anatomical structure, function, morphological characteristics, and histochemical reactions, exist between the two. Patients with LCC have a significantly better prognosis than those with RCC in terms of OS, which was indicated by this research. In addition, tumor size [5] was validated as an independent factor for OS in patients with colorectal adenocarcinoma of infiltrative and ulcerative types in a previous study. This research suggested that large tumors led to a poor prognosis.  Whether adjuvant chemotherapy is suitable for stage I-III colon cancer remains controversial. According to the NCCN guidelines, it is recommended that patients with stage II colon cancers with risk factors and stage III colon cancers accept adjuvant chemotherapy [22,23]. In this study, histological differentiation, grade, right colon, LNH less than 12, LODDS, tumor size, marital status, T stage, and N stage were identified as independent risk factors for stage I-III colon cancer [14]. Histological differentiation was identified as an important feature to evaluate the benefit of adjuvant chemotherapy in a previous study [24]. This nomogram proved that low histological differentiation was associated with a poor prognosis. Low histological grade was considered among the adverse histopathological factors associated with an unfavorable clinical course of colon cancer. A previous study demonstrated that tumor location was associated with prognosis in colon cancer patients [21]. Furthermore, the appropriate staging of colon cancer requires at least 12 lymph nodes to be sampled, as recommended by the NCCN guidelines. Relevant research indicated that stage I-III colon cancer patients with LNH less than 12 tended to have shorter CSS and OS than those with LNH more than 12, which corroborated the results of this nomogram [25].
This nomogram showed that a high LODDS status was related to poor survival outcomes.
Marital status is another independent prognostic factor for survival in colon cancer. Previous research showed that being married was associated with better outcomes of colon cancer patients, but unmarried colon cancer patients, including single, separated, divorced, and widowed patients, were at a greater risk of mortality [15], which was reproduced in this research. Our nomogram shows that separated, divorced, and widowed patients were associated with a greater risk of mortality.
However, this study still had some limitations. First, treatment information except for surgery was not available in the SEER database and was thus not incorporated into our analysis. Second, the SEER database is devoid of variables such as detailed histological information, mode of presentation, and ECOG prognostic scores and lacks 90% of biomarker expression states (e.g., RAS, BRAF, PIK3CA and genes involved in DNA mismatch repair, which have been proven to predict survival). Last, this study did not contain any external validation cohort. Additional prospective data and the incorporation of other factors are encouraged to improve this model.

Conclusion
In conclusion, we established and validated a nomogram for predicting CSS and OS probabilities in stage I-III colon cancer patients. The simple nomogram had sufficient discriminatory and calibration capability in addition to exceptional clinical effectiveness and could be an easy-to-use tool for clinicians to promote a personalized postoperative prognostic assessment and to identify treatment strategies for patients with stage I-III colon cancer. Fig. 6 a Kaplan-Meier estimated cause-specific survival in patients with TNM stage II colon cancer stratified by the nomogram risk score. b Kaplan-Meier estimated overall survival in patients with TNM stage II colon cancer stratified by the nomogram risk score. c Kaplan-Meier estimated cause-specific survival in patients with TNM stage III colon cancer stratified by the nomogram risk score. d Kaplan-Meier estimated overall survival in patients with TNM stage III colon cancer stratified by the nomogram risk score. e Kaplan-Meier estimated cause-specific survival in stage II-III colon cancer patients without chemotherapy stratified by the nomogram risk score. f Kaplan-Meier estimated overall survival in stage II-III colon cancer patients without chemotherapy stratified by the nomogram risk score. g Kaplan-Meier estimated cause-specific survival in stage II-III colon cancer patients with chemotherapy stratified by the nomogram risk score. h Kaplan-Meier estimated overall survival in stage II-III colon cancer patients with chemotherapy stratified by the nomogram risk score (See figure on next page.) Fig. 7 a Decision curve analysis of the nomogram and TNM stage for the cause-specific survival prediction of stage I-III colon cancer patients. b Decision curve analysis of the nomogram and TNM stage for the overall survival prediction of stage I-III colon cancer patients Fig. 8 a Cause-specific survival in the subgroups according to a tertiles of the total score. b Overall survival in the subgroups according to a tertiles of the total score