Application of Arti cial Intelligence in a Real-World Research for Predicting the Risk of Liver Metastasis in T1 Colorectal Cancer

Tenghui Han Air Force Medical University Xijing Hospital: Xijing Hospital Jun Zhu Air Force Medical University Xijing Hospital: Xijing Hospital Dong Xu Xi'an Medical University Rujie Chen Air Force Medical University Xijing Hospital: Xijing Hospital Shuai Wang Ming gang station hospital, Xi'an Institute of ight of the air force Xiaoping Chen Southern Theater Air force Hospital Jianyong Zheng Air Force Medical University Xijing Hospital: Xijing Hospital Chunsheng Xu (  hjb2015@stu.xjtu.edu.cn ) Air Force Medical University Xijing Hospital: Xijing Hospital https://orcid.org/0000-0001-5755-9794


Introduction
Colorectal cancer (CRC) is one of the most prevalent gastrointestinal tract malignancies with considerably high morbidity and mortality, attention of which is universally acknowledged to increase annually [1][2][3]. With 2/3 of CRC patients, metastasis is commonly deemed as a crucial clinical feature and a risk factor of high mortality for intractable CRC [4]. During the progression of CRC, over 50% of patients tend to develop liver metastasis (LM) which is the predominant contributor to poor prognosis of CRC [4,5].
Endoscopic therapy is a widespread acceptation and adoption treatment approach for T1 CRC patients. Whereas, for early CRC patients with LM, traditional surgery section and chemoradiotherapy are the most effective and recommended treatments, which signi cantly prolong the overall survival (OS) rate [6,7]. However, considering the inferior early screening methods, approximately 90% of CRC patients with LM failed to be diagnosed precisely in the early stage and thus undergo incomplete endoscopic resection, which nally cause adverse clinical outcomes [8,9]. Although scholars have completed abundant researches on metastasis-related signatures in vivo and vitro, a satisfactory predictive model of LM for CRC in early stages is still lacking [10][11][12]. Thus, it is necessary and urgent to develop an easily applicable model to accurately predict the risk of LM for patients in early course of CRC.
Currently, there is an increasing and irreversible trend of discipline integration between medical science and arti cial intelligence (AI) [13][14][15]. Besides, the depth and breadth of the discipline integration have signi cantly enhanced [14,15]. Researchers have employed machine learning (ML) as the breakpoint to better solve the complicated problem for clinical prediction and acquired several signi cant breakthroughs in CRC [16][17][18]. Given that most of the present studies merely focused on the public database when studying the apparent discrepancy among different populations, limitations ineluctably appeared.
Consequently, clinical data involving the real outer validation is vital to construct a superior prediction model.
In the study, we established a comprehensive recognition model by adopting AI and ML algorithms for the rst time, which could remarkably promote the identi cation of T1 CRC with LM and improve the prognosis of these patients in clinical practice. In addition, the predictive model was constructed via using clinical common and accessible parameters, and further validated in an independent CRC cohort.

Clinical Sample Collection
An open-access and publicly available CRC cohort was retrieved from Surveillance, Epidemiology, and End Results (SEER) Program database in the U.S. National Cancer Institute. The CRC cohort functions as a powerful resource for investigators to comprehensively understand the natural history of CRC and signi cantly ameliorate the healthcare quality for CRC patients [19,20]. An additional outer validation cohort of CRC patients who underwent surgeries from 2010 to 2021 was obtained from Xijing hospital. The CRC cohort's inclusive criteria are shown as follows: 1) The primary diagnosis is CRC; 2) patients were diagnosed with T1 CRC; 3) patients with su cient clinical data. In addition, CRC patients who have undergone neoadjuvant radiotherapy were excluded. Written and informed consent was obtained from all participants. All aspects of the clinical cohort study were evaluated by and included in the Institutional Ethics Committee of Xijing Hospital.

Study Population
T1 CRC is de ned as a category of tumor that invades only the submucosa, regardless of the presence or absence of lymph node metastasis (LNM). Utilizing the SEER database which employed the 7th cancer TNM stages of the American Joint Committee, we analyzed the data of all patients diagnosed with T1 CRC from 2010 to 2016. Primary demographic data, tumor information and laboratory indexes were extracted by utilizing SEER disease codes and then employed for model construction.
Basic demographic data include age at diagnosis, gender, race, and marital status. Tumor information contain primary site, size, grade, histologic category and TNM stage. Laboratory indexes involve carcinoembryonic antigen (CEA) prior to surgery, tumor deposits, and perineural invasion (PNI). Survival time and status were collected for further estimation of the predictive model. Additionally, the information of our validation cohort was normalized via following the criteria of the SEER database.

Construction of the Predictive Model
In our research, seven ML models were employed to predict LM in patients with T1 stage CRC. For tree decision models, we adopted Light Gradient Boosting Decision (LGBM), Random Forest (RF), and Classi cation and Regression Trees (CART).
LGBM is a gradient boosting framework that utilizes the tree-based learning algorithm, which has been applied in the construction of medical models in recent years [21,22]. RF is a widely employed ML algorithm to deal with classi cation and regression issues via the multiple decision trees approach [23]. CART is a classical decision tree algorithm to handle classi cation or regression predictive models [24]. For the basic prediction technique, the K-Nearest Neighbor (KNN) algorithm was applied. KNN is a vital classi cation algorithm in the supervised ML domain and is broadly applied in pattern recognition, data mining and intrusion detection [25]. For the kernel-based model, the Support Vector Machine (SVM) was selected. SVM is a supervised ML model that employs classi cation algorithms for two-group categorization [26]. Gaussian Naive Bayesian (GNB) algorithm was included in the linear model and is speci cally used when the features manifest continuous values [27]. Multilayer Perceptron (MLP) is a feed-forward neural network supplement and has been widely applied in various prediction models [28]. After employing the Bootstrap aggregating (Bagging) algorithm to optimize the performance of established models, stacked regression was utilized to obtain a stacking model via integrating7 models to output a superior outcome [29,30].
To facilitate the model performance and retain the maximum authenticity of data, we strictly employed the Synthetic Minority Over-sampling technique in the inner training dataset [31]. To begin with, patients in the SEER database were randomly assigned to the training set (80%) and testing set (20%) while the proportion of LM (+) (patients with LM) and LM (-) (patients without LM) groups was nearly identical. In the training set, k-fold cross validation (k = 10) was performed, and grid search was adopted to nd out the best combination of parameters. For each set of parameters, the model was in turn tted and validated with 8/10 and 2/10 of data, respectively. Subsequently, our T1 CRC cohort in the Chinese population was utilized as an extra outer validation set further to examine both the applicability and e ciency of the model. The overall work ow is exhibited in

Assessment of Model Performance
To ensure rational comparison of the models, confusion matrix, the area under the curve (AUC), sensitivity, speci city, precision, negative predictive value (NPV), false discovery rate (FDR), accuracy, and average precision (AP) were applied as indicators for assessing model performance. In addition, the area under receiver operating characteristic curves (AU-ROC) was utilized as a performance index while the AP value was employed as the criterion for the precision-recall (PR) curve [32]. The average value of parameters was ultimately executed on the testing set and additional outer validation one. Survival analysis was further adopted in the model to evaluate its capability of predicting outcomes of CRC patients.
Statistical Analysis SEER*Stat software (8.3.6 version) was adopted to acquire targeted CRC patients from the SEER database. Python (version 3.6.9) and R software (version 4.0.5) were utilized to perform statistical analyzes. Demographic differences between the two groups were tested using either Student's t-test or Pearson chi-square test. Results were considered statistically signi cant when P ≤ 0.05.

Case Structures and Clinical Baselines
The initial LM data was included in 2010 and the latest one was updated in 2016 in the SEER database. In the current study, 262,285 CRC patients from 2010 to 2016 were included. According to the above inclusive and exclusive criteria, a total of 16785 patients were ultimately enrolled in the inner dataset while 326 out of 8,226 CRC patients in Xijing hospital were recruited. The data of these 326 patients was further normalized via SEER database standard. Baselines of the inner training set, inner testing set, and outer validating set were exhibited in Table 1.
Eleven independent clinical factors were included in the model, consisting of age at diagnosis, gender, marital status at diagnosis, primary site, tumor size, tumor grade, tumor type, N stage, CEA level, tumor deposits, and PNI (Table 2). Patients from SEER database were categorized into LM (-) group (16,023 patients without LM, 95.5%) and LM (+) (762 patients with LM, 4.5%) group respectively. In LM (+) patients, the age at diagnosis is mostly ranged from 40 to 90 (721/762, 94.6%). Besides, the proportion of diagnosed age less than 60 years in LM (+) group (333/762; 43.7%) is signi cantly surpassed the LM (-) group (6553/16,023; 40.9%; P< 0.001). The proportion of male with T1 CRC is signi cantly higher in LM (+) group compared with LM (-) one (P = 0.001), while race demonstrated no statistical difference between the two groups. Intriguingly, a higher occurrence rate was observed in the single (167/2611, 6.4%) than the married (376/8918, 4.2%; P<0.001). The rectum is the most common primary site in both groups, and its proportion is comparatively higher in the T1 stage than other T stages in all CRC patients (P< 0.001). Average tumor size of LM (+) group (mean = 52.1mm) was considerably larger than that of LM (-) one (mean = 17.5mmp; P< 0.001). LM (+) group portended a dramatically higher proportion of Grade II-IV than LM (-) group (92.8% vs 68%; P<0.001). Similarly, T1 CRC patients with LM tend to have advanced N stage (P<0.001). Adenocarcinoma (Adenocarcinoma, NOS, Adenocarcinoma in tubulovillous adenoma, and Adenocarcinoma in adenomatous polyp; 12714/16785, 75.7%) is the most common neoplastic category among all patients. Furthermore, we observed a signi cantly higher level of positive CEA, more tumor deposits and more PNI in LM (+) group than LM (-) one (P< 0.001). Additionally, the baselines of SEER training, SEER testing and our outer validating sets were exhibited in Table 2.

Parameters tuning in our models
We trained the LGBM with a depth of ve, a learning rate of 0.01, basic learners of 240, leaves of 16, and max bins of 128. For RF and CART, we also elected 5 as the max depth of the basic trees. The number of neighbors 200 for KNN is the best. In MLP, we ultimately selected a learning rate of 0.01, epochs of 300, hidden layer of 1, and employed the Adam Optimizer and ReLU activation function. For SVM, a combination of a C value of 0.01 and kernel smoothing parameters of 0.0001 was determined.
Lastly, every Bagging model, which owns 10 basic models, was trained with identical algorithms but different data. The ultimate stacking model consists of seven bagging models, which outputs probability and a GNB as meta classi er.

Evaluation of Models
To better evaluate the performance of our constructed models, ROC curves and PR curves during the model training were plotted. Via internal verifying, all models were observed to have superior predictive abilities (AUC values > 0.94). And, by incorporating seven other single models, the stacking model demonstrated an ultimate AUC of up to 0.9631 (Figure 2A). Except for GNB models, AP values of nearly all models attain relatively preferable levels. Noticeably, the ultimate AP of the stacking mode reached 0.693 ( Figure 2B). Intriguingly, the external validation set demonstrated more desirable performance. All models have exhibited dramatically high predictive value except the MLP model, and the stacking model contains a nal AUC value of 0.992 and an ultimate AP value of 0.811 ( Figure 2C, D).
Additionally, via employing the confusion matrix to evaluate the value of models, predictive outcomes of both the inner testing set and outer validation set were shown in Table 3.
LGBM produced fewer quantities of FN (False Negative) and FP (False Positive) than other models in both testing sets. The stacking model was capable of screening approximately all LM (+) patients in both sets. Detailed values of AUC, sensitivity, speci city, precision, NPV, FDR, accuracy, AP, F1-values, and Matthews correlation coe cient of each model in inner and outer validation sets were listed respectively in Table 4 and Table 5. The accuracy of 5 single models reached 0.95, among which LGBM displayed the highest accuracy (0.9657). The speci city of MLP and sensitivity of GNB were the highest among seven single models. Generally speaking, the stacking model demonstrated the most satisfying AUC and sensitivity, indicating that this model has clinical value for early screening of LM, excellent precision, NPV, FDR, accuracy, AP score, F1 score, and Matthews correlation coe cient value in CRC patients.
Furthermore, employing survival status and time from the SEER database, we plotted the Kaplan Meier (K-M) curves of the testing set. It is universally acknowledged that LM is an unfavorable prognostic indicator for CRC patients ( Figure 3A). Likewise, we found that the stacking model resembled LM in predicting T1 CRC patients' outcomes ( Figure 3B).

Comparison of Signi cance of Each Factor
In all single models, tumor size, preoperative CEA levels, tumor deposits, N stage, histology, and PNI played a vital role in predicting for LM in T1 CRC. Even though the AI model manifested desirable performance, the individualized in uence of each factor on the result and underlying relationships between these factors remain unknown. Hence, we calculated and digitized the signi cance of each factor used in the built-up AI models ( Figure 4). We found that tumor size, CEA level prior to surgery, tumor deposits, and N stage were the top four crucial predictors among all models. Noticeably, and tumor size was the most critical one in nearly all models.

Discussion
The liver is identi ed as one of the most common metastasis sites and LM is recognized as the most lethal factor of CRC patients [33,34]. Early diagnosis of LM could assist clinicians in taking active intervention timely to improve the prognosis of patients, especially for CRC T1 patients [35,36]. CRC patients in T1 stage could either choose surgical or endoscopic treatment, partly depending on the status of distant metastasis. Therefore, a convenient and accurate predictive model of LM is urgently demanded to offer guidance on personalized therapeutic strategies and evaluation of 5-year OS.
In the study, we established a new and convenient model to predict early LM by incorporating 11 clinicopathologic parameters in T1 CRC using seven AI methods. Our ndings indicated that age, gender, married status, primary site, tumor size, CEA, tumor type, grade, N stage, and PNI were critical factors in the prediction of LM in the AI models. We rstly combined our real-world researches with public data online on a large scale to comprehensively construct and assess LM predictive models in T1 CRC.
Given that the AUC of these models was more extensive than 0.94 and model accuracy was approximate as 100 % as possible, we concluded that the above-established models are ideal and robust in yielding clinical bene t, which might aid clinicians to select potential LM CRC patients e ciently.
Our real-world research incorporated 326 cases of T1 CRC, among which LM occurred in merely eight patients (8/326), signi cantly lower than that of the SEER database (762/16785, P < 0.001). The discrepancy in the LM ratio might be attributed to low diagnostic e cacy in developing countries [37,38]. Interestingly, compared with more advanced T stage CRC patients (169/326), PNI was more commonly seen in T1 CRC patients of our hospital (1266/8226), consistent with the SEER database (11350/16785, P < 0.001). Abundant evidence has demonstrated that the percentage of PNI occurring in all T stages is approximately 10-15%. Moreover, PNI is an independent biomarker that could indicate aggressive behavior and unfavorable prognosis of CRC [39][40][41][42]. Nonetheless, little literature has explained the underlying reasons behind the high ratio of PNI in T1 CRC, which deserves further investigation. In addition, serum CEA was con rmed to have a positive relationship with LM.
Accumulating evidence has suggested that the expression level of CEA could function as an independent indicator for the prognosis of CRC patients [43]. Therefore, it was not surprising that the concentration of preoperative plasma CEA was signi cantly higher in CRC patients with LM compared with those with primary CRC [44][45][46]. Besides, among all indicators, tumor size is regarded as one of the most important in predicting LM status. It has been reported that tumor size is intimately associated with both lymph and hepatic metastases of CRC [47]. Furthermore, scientists have veri ed that age might play a nonnegligible role in the advancement and prognosis of CRC [48]. Despite increment in young CRC patients, it has been reported that the young tend to have more favorable outcomes than the old [48]. Contradictorily, we indicated that CRC patients younger than 60-year-old were more apt to experience risk of LM than counterparts, which is consistent with several researchers [49][50][51]. Potential reasons might be relevant to commonly frequent occurrence of mismatch repair gene mutation and more aggressive tumor biology in younger patients [52].
To date, multitudes of researchers have constructed practical models to predict the metastatic capability of CRC. For instance, MS Tang [12] et al. have built up a novel nomogram to predict LM in all T stages CRC patients by using multivariable Cox regression. They also found that synchronous LM was an independent prognostic factor for CRC patients [12]. Likewise, Ji Hyun Ahn [17] et al. have developed an innovative model to predict LNM in the early stage of CRC patients via utilizing the SEER database and adopting seven AI methods. Nevertheless, these studies were retrospective, single-center, and with small quantities of patients. Besides, acquired data are limited due to the low incidence of LM in early CRC. With the recent technical advancement of AI, the application of ML model in neoplastic diagnosis and prognostic assessment has become increasingly prevalent [53,54]. Ichimasa et al. [55] have demonstrated that AI could reduce unnecessary surgery after endoscopic resection of LNM (-) T1 CRC compared with current guidelines. Nonetheless, few models for predicting the incidence of LM in T1 CRC patients were developed and assessed utilizing AI methods. In the current study, we established nine models and validated them in our own dataset. Besides, their e cacy of predicting LM in early CRC was also compared via using easily available clinical and histopathological features. Furthermore, we found that our constructed AI models could not only assist clinicians in selecting patients with a high risk of LM, but also resemble LM in the accurate prediction of T1 CRC patients' outcomes.
This study still has several limitations and weaknesses. Firstly, in light that the SEER database is an open and available national program of America, these newly established models might not be perfectly applied in other countries. Secondly, quantities of enrolled patients in our hospital were far from su cient, and merely eight patients manifested LM status. These shortcomings might lead to a limited veri cation outcome. In the future, more in-depth and extensive studies are urgently needed.

Conclusions
In the present study, we established an innovative and stacking bagging model which incorporates 11 clinicopathologic features to predict the incidence of LM in T1 CRC. Our ndings indicated that age, gender, married status, primary site, tumor size, CEA, tumor type, grade, N stage and PNI were crucial factors for predicting LM, among which tumor size matters most. As       The work ow of selection procedure for colorectal cancer patients