- Research
- Open access
- Published:
Determining the prognosis of Lung cancer from mutated genes using a deep learning survival model: a large multi-center study
Cancer Cell International volume 23, Article number: 262 (2023)
Abstract
Background
Gene status has become the focus of prognosis prediction. Furthermore, deep learning has frequently been implemented in medical imaging to diagnose, prognosticate, and evaluate treatment responses in patients with cancer. However, few deep learning survival (DLS) models based on mutational genes that are directly associated with patient prognosis in terms of progression-free survival (PFS) or overall survival (OS) have been reported. Additionally, DLS models have not been applied to determine IO-related prognosis based on mutational genes. Herein, we developed a deep learning method to predict the prognosis of patients with lung cancer treated with or without immunotherapy (IO).
Methods
Samples from 6542 patients from different centers were subjected to genome sequencing. A DLS model based on multi-panels of somatic mutations was trained and validated to predict OS in patients treated without IO and PFS in patients treated with IO.
Results
In patients treated without IO, the DLS model (low vs. high DLS) was trained using the training MSK-MET cohort (HR = 0.241 [0.213–0.273], P < 0.001) and tested in the inter-validation MSK-MET cohort (HR = 0.175 [0.148–0.206], P < 0.001). The DLS model was then validated with the OncoSG, MSK-CSC, and TCGA-LUAD cohorts (HR = 0.420 [0.272–0.649], P < 0.001; HR = 0.550 [0.424–0.714], P < 0.001; HR = 0.215 [0.159–0.291], P < 0.001, respectively). Subsequently, it was fine-tuned and retrained in patients treated with IO. The DLS model (low vs. high DLS) could predict PFS and OS in the MIND, MSKCC, and POPLAR/OAK cohorts (P < 0.001, respectively). Compared with tumor-node-metastasis staging, the COX model, tumor mutational burden, and programmed death-ligand 1 expression, the DLS model had the highest C-index in patients treated with or without IO.
Conclusions
The DLS model based on mutational genes can robustly predict the prognosis of patients with lung cancer treated with or without IO.
Background
To optimize treatment regimens, predicting the prognosis of patients with lung cancer is vital. Accordingly, gene status has gradually become the focus of prognosis prediction. Based on high-throughput sequencing, multi-panels have been routinely evaluated in clinical treatment, revealing various candidate genes. For instance, the KRAS-G12C mutation is associated with poorer outcomes in surgically resected lung adenocarcinoma than wild-type KRAS [1]. Meanwhile, the SMARCA4 mutation is an independent predictive factor for poor prognosis in lung cancers, however, is also associated with immunotherapy (IO) sensitivity [2]. Additionally, mutations in EGFR, STK11, and B2M, or MDM2 amplification, are related to IO resistance or hyperprogressive disease [3,4,5], while TP53, KRAS, and POLE mutations are positively associated with a good response in advanced non-small cell lung cancer (NSCLC) [6,7,8,9].
Deep learning has frequently been implemented in medical imaging (including magnetic resonance imaging, computed tomography, and positron emission tomography) to diagnose, prognosticate, and evaluate treatment responses in patients with cancer [10,11,12]. Previous studies have used several genes or immune cell subtypes to develop models to predict IO or chemo-IO responses by machine learning. These studies achieved highly reliable and accurate results [13,14,15]. However, few deep learning survival (DLS) models based on mutational genes that are directly associated with patient prognosis in terms of progression-free survival (PFS) or overall survival (OS) have been reported, and their potential value remains unclear. Additionally, DLS models have not been applied to determine IO-related prognosis based on mutational genes.
The current study employed a DLS algorithm utilizing a panel of mutated genes to create a robust survival model to identify individuals with lung cancer and good prognosis in several large centers. Based on whole-genome sequencing (WGS), next-generation sequencing (NGS), and whole-exome sequencing (WES) databases, the DLS model was used to predict OS in patients with lung cancer who were treated without IO and to predict PFS in patients with lung cancer who were treated with IO. The predictive ability of the DLS model was compared with that of clinical tumor-node-metastasis (TNM) staging and the COX model. In addition, the ability of the DLS model to predict PFS in those who received IO was compared with that of the COX model, tumor mutational burden (TMB), and programmed death-ligand 1 (PD-L1) expression. A robust survival prediction model based on genomics panels will aid oncologists in implementing appropriate treatment strategies for patients with lung cancer.
Methods
Patients treated without IO
MSK-MET cohort
A total of 25,775 patients with metastatic cancers were included in the MSK-MET cohort [16]. However, 21,711 with other tumors were excluded, resulting in a final cohort comprising 4064 patients with lung cancer. Additionally, 271 patients had incomplete clinical or survival data and were thus excluded from this study. Ultimately, the data for 3793 patients with lung cancer were analyzed. The MSK-MET cohort was classified into training (n = 2504) and inter-validation (n = 1289) cohorts; all tumor samples were evaluated by NGS.
OncoSG cohort
The OncoSG cohort comprised 305 patients from East Asia countries. Eight patients lacking clinical or survival data were excluded [17]. Hence, 297 patients with lung adenocarcinoma were included in an independent validation cohort. All tumor samples were evaluated by WES.
MSK-CSC cohort
This cohort comprised 10,945 patients, of which, 9588 patients with other tumors were excluded [18]. Further, 417 patients without clinical or survival data were excluded. Thus, 940 patients with lung cancer comprised an independent validation cohort. All tumor samples were assessed by NGS.
TCGA-LUAD cohort
Among the 566 patients with lung adenocarcinoma, 52 were excluded due to a lack of clinical data (https://www.cell.com/pb-assets/consortium/pancanceratlas/pancani3/index.html). Moreover, 26 patients without complete survival data were excluded. Thus, 488 patients with lung adenocarcinoma comprised an independent validation cohort; all tumor samples were assessed by WGS.
Patients treated with IO
MIND cohort
A total of 247 patients with lung cancer from the Memorial Sloan Kettering Cancer Center (MSKCC) cohort were recruited [19]. All patients received anti-PD-1/PD-L1 treatment. One patient was excluded due to a lack of clinical data. Hence, 246 patients were included in this training cohort. All tumor samples were evaluated by NGS.
MSKCC cohort
A total of 349 patients from a clinical trial and retrospective analysis (NCT01454102, NCT01295827) who received anti-PD-1/PD-L1 monotherapy or combinatorial treatment with anti-CTLA4 were included [20]. These patients constituted another validation cohort. All tumor samples were analyzed by NGS.
POPLAR/OAK cohort
The POPLAR and OAK studies (NCT01903993, NCT02008227) recruited 1137 patients with advanced or metastatic NSCLC [21, 22]. Patients treated with docetaxel (n = 568) and those without blood TMB data (n = 140) were excluded. Ultimately, the POPLAR/OAK cohort comprised 429 patients as a validation cohort. All blood samples were tested by NGS.
This study (2023-LUNSHEN-02) was approved by the institutional review board of the Second Affiliated Hospital of Guizhou Medical University and was performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all patients for tissue or blood use.
Study design
Figure 1 illustrates the flowchart of proposed DLS models for predicting OS and PFS. In the MSK-MET (training) cohort, optimal mutated genes were identified by the least absolute shrinkage and selection operator (LASSO) algorithm based on five-fold cross-validation. The selected genes served as input for training the DLS models to predict OS. The training parameters were adjusted, and the DLS models were validated for OS in the MSK-MET (inter-validation), OncoSG, MSK-CSC, and TCGA-LUAD cohorts. The LASSO algorithm for predicting PFS was also used to select the mutated genes in the MIND cohort in patients treated with IO. The trained DLS model was fine-tuned and retrained in the MIND cohort and, subsequently, tested in the MSKCC and POPLAR/OAK cohorts. The COX models were analyzed in patients treated with and without IO. The performance of the DLS model, COX model, and TNM staging for predicting OS in patients treated without IO was compared (via the C-index). Furthermore, the performance of the DLS model, COX model, TMB, and PD-L1 expression level for predicting PFS was compared among patients treated with IO using the C-index.
TMB, PD-L1 expression analysis, and selection of optimal mutated genes
Based on WES, WGS, and NGS profiling, a TMB ≥ 10 mutations (muts)/Mb or a total number of somatic nonsynonymous mutations ≥ 200 was defined as a high TMB. The tumor cells were considered to have a high PD-L1 expression level when > 50% stained positive. All mutated genes were defined as “1” and wild-type genes were defined as “0.” The optimal mutated genes were selected via LASSO and five-fold cross-validation sampling (Fig. 2). The mutated genes were separately selected to predict OS in patients treated without IO and PFS in patients treated with IO. The selected genes served as input variables for the deep learning model.
DLS model and implementation
As presented in Fig. 2, the DLS model is a multi-layer perceptron similar to the Faraggi–Simon network (https://github.com/jaredleekatzman/DeepSurv). However, the DLS model comprises multiple additional hidden layers as well as various new methods, including weight-decay regularization, batch normalization, rectifying linear units, dropout, Stochastic gradient descent using gradient pruning, learning rate scheduling, and Nesterov momentum. A single node served as an output of the network that estimated the weight of the risk function parameterized by the network. The loss function was set as a negative log-likelihood function represented by Eq. (1):
The selected genes were imported into the DLS model as vectors. The maximum number of epochs was set to 100 to ensure proper implementation of the training procedure. TensorFlow-1.14 in Python (https://www.python.org/) was utilized to implement deep learning. The experiment was conducted in Windows with the following configurations: 3.7 GHz Intel i7-12700KF CPU, NVIDIA GeForce RTX 3090, and 32 GB of RAM.
Statistical analysis
This study employed the LASSO algorithm, which utilized five-fold cross-validation, to select the optimal non-zero coefficients. A deep learning algorithm-based survival model was applied to predict OS in patients treated without IO and PFS in patients treated with IO. The DLS model’s performance was evaluated in the training and other validation cohorts. The optimal cutoff value for predicting OS or PFS was defined with the X-tile software (https://medicine.yale.edu/lab/rimm/research/Software/). The Kaplan–Meier approach was employed to analyze the PFS and OS curves, which were then plotted with the “survivminer” package. The COX model was based on selected genes using the “rms” package. The accuracies of different models were compared using the C-Index; higher C-indices indicated more accurate model predictive ability. The statistical analyses for this study were performed utilizing R version 3.5.1 (https://www.r-project.org/) and GraphPad Prism 7.01 (https://www.graphpad.com/). Statistical significance was set at P < 0.05.
Results
Characteristics of individuals treated without and with IO
The basic clinical characteristics of patients with NSCLC treated without IO in the MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD cohorts are shown in Supplementary Table 1. There were 2064 (54.42%), 150 (50.50%), 461 (49.05%), and 229 (46.93%) male patients in the MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD cohorts, respectively. In the MSK-MET, OncoSG, and TCGA-LUAD cohorts, 2060 (54.31%), 183 (61.62%), and 325 (66.60%) patients were aged > 60 years. Most patients (62.29%) were never smokers in the OncoSG cohort. Moreover, 817 (21.54%), 24 (8.08%), 218 (23.20%), and 173 (35.45%) patients, respectively, had a high TMB (≥ 200 or > 20 muts/Mb) and the TMB status was diverse in the different populations.
The basic clinical features of individuals with NSCLC treated with IO in the MIND, MSKCC, and POPLAR/OAK cohorts are presented in Supplementary Tables 2, with 112 (45.53%), 172 (49.28%), and 275 (78.80%) male patients, respectively. In the 3 cohorts, 190 (77.23%), 222 (67.15%), and 265 (75.93%) patients, respectively, were aged > 60 years. Most individuals in the MSKCC (80.51%) and POPLAR/OAK (80.51%) cohorts were current or ever smokers. Additionally, in the 3 cohorts, 15 (3.50%), 71 (20.34%), and 175 (27.22%) patients, respectively, had a high TMB (≥ 200 or > 20 muts/Mb) with diverse TMB status among the populations. In the MIND, MSKCC, and POPLAR/OAK cohorts, 119 (48.37%), 43 (12.32%), and 59 (12.33%) individuals, respectively, had positive PD-L1 expression (> 1%). In these 3 cohorts, 81 (32.93%), 218 (62.46%), and 295 (68.76%) patients, respectively, achieved durable clinical benefits.
Selection of mutational genes associated with prognosis in patients with and without IO
Based on the five-fold cross-validation, LASSO was applied to select the optimal mutational genomics from the MSK-MET cohort (training). In total, 45 somatic mutations were selected (Fig. 3a; Supplementary Table 3). High-mutational-frequency genes, such as TP53, EGFR, STK11, KRAS, and KEAP1, were selected in the MSK-MET cohort (training). Similarly, in the MIND cohort, 27 somatic mutations were identified in patients with lung cancer treated with IO (Fig. 3b). The Kyoto Encyclopedia of Genes and Genomes analysis revealed that the 45 mutational genes were associated with various cancer pathways, including hepatocellular carcinoma, head and neck squamous cell carcinoma, and breast cancer (false discovery rate [FDR]: P < 0.001; Fig. 3c). An association was observed between the 27 mutational genes for predicting PFS in the MIND cohort and immunology signaling pathways (FDR: P < 0.001; Fig. 3d), including the regulatory circuits of the STAT3 signaling pathway and cellular response to DNA damage stimuli. Subsequently, a panel of 45 mutational genes was employed to train the model in predicting OS in the MSK-MET cohort (training) treated without IO based on deep learning algorithms. The model was the retrained using a panel of 27 mutational genes to predict PFS in the MIND cohort treated with IO.
Training and testing the DLS model for OS in patients treated without IO
The DLS model was run using the TensorFlow 1.14 platform (https://tensorflow.google.cn/install/source). The MSK-MET cohort (training) was trained in 100 epoch processes, and the MSK-MET cohort (inter-validation) was used for validation (Supplementary Fig. 1). The OncoSG, MSK-CSC, and TCGA-LUAD cohorts were tested using the trained DLS model. According to the cutoff value (0.50) of DLS scores as the X-tile (https://en.freedownloadmanager.org/Windows-PC/X-tile-FREE.html), individuals with NSCLC treated without IO were stratified into high (> 0.50) and low (≤ 0.50) DLS groups. The high DLS group had a shorter median OS than the low DLS group (24.18 months vs. not reached [NR]; hazard ratio [HR] = 4.13 [3.66–4.67], P < 0.001; Fig. 4a) in the MSK-MET cohort (training) treated without IO (Fig. 4a). In the MSK-MET cohort (inter-validation), the high DLS group also had a shorter median OS than the low DLS group (19.68 months vs. NR; HR = 5.71 [4.85–6.72], P < 0.001; Fig. 4b). In the OncoSG and MSK-CSC cohorts, the high DLS group was validated and had a shorter median OS than the low DLS group (OncoSG: 59.00 months vs. NR; HR = 2.37 [1.54–3.67], P < 0.001; MSK-CSC: 25.40 months vs. NR; HR = 1.82 [1.40–2.35], P < 0.001, Fig. 4c, d). Likewise, in the TCGA-LUAD cohort, the high DLS group had a shorter median OS and PFS than the low DLS group (OS: 32.45 vs. 63.10 months; HR = 4.63 [3.43–6.25], P < 0.001; PFS: 22.49 vs. 51.55 months; HR = 2.08 [1.58–2.75], P < 0.001, Fig. 4e, f).
DLS model fine-tuning and retraining for PFS in patients treated with IO
In determining the prognosis of patients receiving anti-PD-1 therapy, the DLS model was fine-tuned and retrained via 27 selected mutational genes. Individuals with NSCLC treated with IO were categorized into the high (> 0.50) and the low (≤ 0.50) DLS groups. The low DLS group had a longer median PFS than the high DLS group (12.80 vs. 2.00 months; HR = 3.41 [2.58–4.98], P < 0.001; Fig. 5a) in the MIND cohort treated with IO. In the MSKCC and POPLAR/OAK cohorts, the low DLS group exhibited better PFS than the high DLS group (both P < 0.001; Fig. 5b, c). The DLS model’s ability to predict OS in the MIND cohort was validated; the low DLS group had a considerably longer median OS duration (24.50 vs. 7.00 months; HR = 4.34 [3.11–6.06], P < 0.001) than that of the high DLS group (Fig. 5d). The low DLS group had better OS than that of the high DLS group in the MSKCC and POPLAR/OAK cohorts (both P < 0.001; Fig. 5e, f).
Comparison of the DLS model with clinical features and the COX model
In all 4 cohorts treated without IO, a routine model was developed using the COX method based on the selected panel of 45 mutational genes. The high COX group had a longer median OS than that of the low COX group (70.67 vs. 32.00 months; HR = 0.48 [0.44–0.53], P < 0.001; Fig. 6a). The C-index of the DLS model was significantly higher than that of the TNM stage or COX model (0.74 vs. 0.60 vs. 0.63). The low DLS group had a better OS than that of the TNM stage I–II groups (P < 0.010; Fig. 6b). In all three cohorts (MIND, MSKCC, and POPLAR/OAK) treated with IO, the low COX group had a longer median PFS than the high COX group (6.34 vs. 2.37 months; HR = 0.53 [0.47–0.61], P < 0.001; Fig. 6c). The C-index of the DLS model was significantly higher than that of the COX model (0.70 vs. 0.61). The low DLS group had a better PFS than that of the high PD-L1 group (P < 0.001; Fig. 6d) and high TMB group (P < 0.001; Fig. 6d). The C-index of the DLS model was significantly higher than that of the PD-L1 and TMB groups (0.70 vs. 0.55 vs. 0.54).
Discussion
In this study, deep learning methods were employed using multi-center sequencing data to develop predictive models for OS or PFS in individuals with NSCLC from several cohorts treated with or without IO. As per our knowledge, this is the largest study to determine prognosis based on sequencing data from patients with NSCLC. Moreover, to prevent over-fitting of the DLS model, the LASSO algorithm was initially utilized to select optimal genes. Ultimately, 45 somatic mutations were selected to predict OS in patients treated without IO. The DLS model was validated in the MSK-MET (inter-validation), OncoSG, MSK-CSC, and TCGA-LUAD cohorts. After fine-tuning and retraining the parameters, a DLS model based on 27 somatic mutations was applied to predict PFS in the MIND cohort treated with IO. The DLS model was also validated in the MSKCC and POPLAR/OAK cohorts. Further, the COX model and TNM staging were compared with the DLS model in all cohorts treated without IO, revealing that the DLS model had the highest C-index. The DLS model also exhibited superior predictive performance compared to the TMB, PD-L1 expression, and COX models in all cohorts.
Although the WGS, NGS, and WES databases have been used increasingly and extensively in cancer research, most studies have focused on several gene panels or sole driver mutational genes. Consequently, the large amount of sequencing data available is not being efficiently utilized, particularly for somatic mutations [23,24,25,26,27,28,29,30]. In contrast, the current study focused primarily on employing a relatively small panel of mutational genes to develop a robust predictive model for disease prognosis. To the best of our knowledge, this is the first study to use deep learning to train somatic mutations for predicting OS in patients treated without IO or routine images. Importantly, different sequencing methods did not affect predictive ability. However, additional research is needed to investigate whether DLS can classify OS prediction utilizing a large amount of data obtained from WES, NGS, or WGS without relying on simple somatic mutations. The genomic sequencing data analyzed in this study were obtained from tumor DNA. Moreover, the training model was validated with data from the other four cohorts (MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD), all of which underwent tumor tissue sequencing. Based on these results, it can be concluded that the DLS model is a feasible and robust method for accurately predicting the OS of patients with NSCLC. Moreover, the DLS model could predict PFS in the TCGA-LUAD cohort undergoing surgery, indicating that this model can be applied to predict recurrence time via sequencing data.
Several machine-learning models have been used to predict PFS and OS in patients who received IO [31,32,33]. However, herein, a deep learning algorithm based on somatic mutations was used for the first time to directly predict PFS. In this study, patients with low DLS had significantly better PFS and OS than did those with high DLS in the MIND, MSKCC, and POPLAR/OAK cohorts. These findings imply that the DLS model could efficiently evaluate clinical prognosis in patients with NSCLC treated with or without IO. In contrast, TMB and PD-L1 expression exhibited unsatisfactory outcomes in predicting PFS and OS in the three cohorts. It is hypothesized that using various detection platforms or different cutoff values for TMB might have led to an uncertain predictive impact. Indeed, the PD-L1 assay may have employed diverse reagents from several manufacturers [34, 35], and the expression levels of PD-L1 from different tumor regions may have differed [36]. Hence, the DLS model is a viable tool that can overcome the drawbacks of TMB or PD-L1 expression levels to predict clinical outcomes in patients with NSCLC treated with IO.
Employing deep learning to predict disease prognosis, involving medical images or clinical features, has gradually been introduced in cancer research [37,38,39]. However, acquiring a large database of clinical features to train models is difficult, especially regarding genomic mutations and patients with cancer who receive IO. Transfer learning is a promising strategy for addressing the issue of small sample sizes [40]. The current study used transfer learning to train the DLS model with similar predictive objectives. The DLS model for predicting OS in patients treated without IO was first trained using larger sequencing data after selecting optimal somatic mutations, avoiding overfitting during training. Although the deep learning method had more parameters and complexities, it also had a higher and more consistent ability to predict OS than the COX model (C-index: 0.74 vs. 0.63). Moreover, deep learning based on genomic mutations could better reflect the prognostic status than simple clinical staging. This indicates that analysis of sequencing mutation information would greatly improve the development of molecular typing in lung cancer. Nevertheless, large-scale sequencing data is difficult to acquire, particularly for patients receiving IO or chemotherapy plus IO. In our study, after the DLS model was trained in patients who did not receive IO, it was retrained using a smaller dataset (MIND cohort), indicative of transfer learning. This method could allow for training with smaller-scale mutational data in other cancers while maintaining model stability. The DLS model also presented higher predictive ability than that of the COX model in patients who received IO (C-index: 0.70 vs. 0.61). Therefore, this novel deep-learning algorithm has the capacity to increase the identified associations between prognosis and gene status greatly.
This study has few limitations. First, although the study included many patients from numerous centers, several clinical variables (e.g., PFS and tumor biomarkers) were missing in the MSK-MET, OncoSG, and MSK-CSC cohorts. Therefore, the DLS model could not incorporate these clinical variables to optimize predictive performance further. Additionally, although a panel of selected somatic mutations based on WES, WGS, or NGS data was employed, copy number variation, mRNA expression, radiomics, and pathology grade were not utilized to predict OS and PFS. A deep learning method based on a multi-omics model could be evaluated. Furthermore, circulating tumor DNA analysis of peripheral blood samples is a noninvasive approach only conducted in the POPLAR/OAK cohort. Hence, the predictive performance of the DLS model for prognosis based on circulating tumor DNA could be further investigated.
Conclusions
Herein, deep learning based on a panel of mutational genes served as a novel and reliable algorithm for determining the prognosis in patients with NSCLC who did or did not receive IO. The DLS model can predict OS and PFS better than the COX model, TNM staging, TMB, or PD-L1 expression. Our findings provide new insights for predicting clinical outcomes in patients with NSCLC based on the WGS, NGS, and WES databases. This new deep learning algorithm from high-throughput sequencing can be exploited to inform pan-cancer clinical decisions.
Data Availability
The data supporting the findings of this study are available upon request from the corresponding author.
Abbreviations
- HR:
-
Hazard ratio
- IO:
-
Immunotherapy
- MSKCC:
-
Memorial Sloan Kettering Cancer Center
- NGS:
-
Next-generation sequencing
- NSCLC:
-
Non-small cell lung cancer
- NR:
-
Not reached
- OS:
-
Overall survival
- PD-L1:
-
Programmed death-ligand 1
- PFS:
-
Progression-free survival
- TMB:
-
Tumor mutational burden
- TNM:
-
Tumor-node-metastasis
- WES:
-
Whole-exome sequencing
- WGS:
-
Whole-genome sequencing
References
Nadal E, Chen G, Prensner JR, Shiratsuchi H, Sam C, Zhao L, et al. KRAS-G12C mutation is associated with poor outcome in surgically resected lung adenocarcinoma. J Thorac Oncol. 2014;9:1513–22. https://doi.org/10.1097/JTO.0000000000000305.
Schoenfeld AJ, Bandlamudi C, Lavery JA, Montecalvo J, Namakydoust A, Rizvi H, et al. The genomic landscape of SMARCA4 alterations and associations with outcomes in patients with Lung cancer. Clin Cancer Res. 2020;26:5701–8. https://doi.org/10.1158/1078-0432.CCR-20-1825.
Ricciuti B, Arbour KC, Lin JJ, Vajdi A, Vokes N, Hong L, et al. Diminished efficacy of programmed death-(ligand)1 inhibition in STK11- and KEAP1-mutant lung adenocarcinoma is affected by KRAS mutation status. J Thorac Oncol. 2022;17:399–410. https://doi.org/10.1016/j.jtho.2021.10.013.
Kato S, Goodman A, Walavalkar V, Barkauskas DA, Sharabi A, Kurzrock R. Hyperprogressors after immunotherapy: analysis of genomic alterations associated with accelerated growth rate. Clin Cancer Res. 2017;23:4242–50. https://doi.org/10.1158/1078-0432.CCR-16-3133.
Rizvi H, Sanchez-Vega F, La K, Chatila W, Jonsson P, Halpenny D, et al. Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell Lung cancer profiled with targeted next-generation sequencing. J Clin Oncol. 2018;36:633–41. https://doi.org/10.1200/JCO.2017.75.3384.
Biton J, Mansuet-Lupo A, Pécuchet N, Alifano M, Ouakrim H, Arrondeau J, et al. TP53, STK11, and EGFR mutations predict Tumor immune profile and the response to anti-PD-1 in lung adenocarcinoma. Clin Cancer Res. 2018;24:5710–23. https://doi.org/10.1158/1078-0432.CCR-18-0163.
Vauchier C, Pluvy J, Theou-Anton N, Soussi G, Poté N, Brosseau S, et al. Poor performance status patient with long-lasting major response to pembrolizumab in advanced non-small-cell Lung cancer with coexisting POLE mutation and deficient mismatch repair pathway. Lung Cancer. 2021;160:28–31. https://doi.org/10.1016/j.lungcan.2021.07.016.
Skoulidis F, Goldberg ME, Greenawalt DM, Hellmann MD, Awad MM, Gainor JF, et al. STK11/LKB1 mutations and PD-1 inhibitor resistance in KRAS-mutant lung adenocarcinoma. Cancer Discov. 2018;8:822–35. https://doi.org/10.1158/2159-8290.CD-18-0099.
Dong ZY, Zhong WZ, Zhang XC, Su J, Xie Z, Liu SY, et al. Potential predictive value of TP53 and KRAS mutation status for response to PD-1 blockade immunotherapy in lung adenocarcinoma. Clin Cancer Res. 2017;23:3012–24. https://doi.org/10.1158/1078-0432.CCR-16-2554.
Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying medical diagnoses and treatable Diseases by image-based deep learning. Cell. 2018;172:1122–31e9. https://doi.org/10.1016/j.cell.2018.02.010.
Peng J, Kang S, Ning Z, Deng H, Shen J, Xu Y, et al. Residual convolutional neural network for predicting response of transarterial chemoembolization in hepatocellular carcinoma from CT imaging. Eur Radiol. 2020;30:413–24. https://doi.org/10.1007/s00330-019-06318-1.
Peng J, Huang J, Huang G, Zhang J. Predicting the initial treatment response to transarterial chemoembolization in intermediate-stage hepatocellular carcinoma by the integration of radiomics and deep learning. Front Oncol. 2021;11:730282. https://doi.org/10.3389/fonc.2021.730282.
Peng J, Xiao L, Zou D, Han L. A somatic mutation signature predicts the best overall response to anti-programmed cell death protein-1 treatment in epidermal growth factor receptor/anaplastic Lymphoma kinase-negative non-squamous non-small cell Lung cancer. Front Med (Lausanne). 2022;9:808378. https://doi.org/10.3389/fmed.2022.808378.
Peng J, Zou D, Gong W, Kang S, Han L. Deep neural network classification based on somatic mutations potentially predicts clinical benefit of immune checkpoint blockade in lung adenocarcinoma. Oncoimmunology. 2020;9:1734156. https://doi.org/10.1080/2162402X.2020.1734156.
Peng J, Zou D, Han L, Yin Z, Hu X. A support vector machine based on liquid immune profiling predicts major pathological response to chemotherapy plus anti-PD-1/PD-L1 as a neoadjuvant treatment for patients with resectable non-small cell Lung cancer. Front Immunol. 2021;12:778276. https://doi.org/10.3389/fimmu.2021.778276.
Nguyen B, Fong C, Luthra A, Smith SA, DiNatale RG, Nandakumar S, et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell. 2022;185:563–75e11. https://doi.org/10.1016/j.cell.2022.01.003.
Chen J, Yang H, Teo ASM, Amer LB, Sherbaf FG, Tan CQ, et al. Genomic landscape of lung adenocarcinoma in East asians. Nat Genet. 2020;52:177–86. https://doi.org/10.1038/s41588-019-0569-6.
Zehir A, Benayed R, Shah RH, Syed A, Middha S, Kim HR, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med. 2017;23:703–13. https://doi.org/10.1038/nm.4333.
Vanguri RS, Luo J, Aukerman AT, Egger JV, Fong CJ, Horvat N, et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell Lung cancer. Nat Cancer. 2022;3:1151–64. https://doi.org/10.1038/s43018-022-00416-8.
Peng J, Zhang J, Zou D, Xiao L, Ma H, Zhang X, et al. Deep learning to estimate durable clinical benefit and prognosis from patients with non-small cell Lung cancer treated with PD-1/PD-L1 blockade. Front Immunol. 2022;13:960459. https://doi.org/10.3389/fimmu.2022.960459.
Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, et al. Atezolizumab versus Docetaxel for patients with previously treated non-small-cell Lung cancer (POPLAR): a multicentre, open-label, phase 2 randomised controlled trial. Lancet. 2016;387:1837–46. https://doi.org/10.1016/S0140-6736(16)00587-0.
Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, et al. Atezolizumab versus Docetaxel in patients with previously treated non-smallcell Lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Lancet. 2017;389:255–65. https://doi.org/10.1016/S0140-6736(16)32517-X.
Tsang ES, Grisdale CJ, Pleasance E, Topham JT, Mungall K, Reisle C, et al. Uncovering clinically relevant gene fusions with integrated genomic and transcriptomic profiling of metastatic cancers. Clin Cancer Res. 2021;27:522–31. https://doi.org/10.1158/1078-0432.CCR-20-1900.
Harding JJ, Nandakumar S, Armenia J, Khalil DN, Albano M, Ly M, et al. Prospective genotyping of hepatocellular carcinoma: clinical implications of next-generation sequencing for matching patients to targeted and immune therapies. Clin Cancer Res. 2019;25:2116–26. https://doi.org/10.1158/1078-0432.CCR-18-2293.
Samur MK, Aktas Samur A, Fulciniti M, Szalat R, Han T, Shammas M, et al. Genome-wide somatic alterations in Multiple Myeloma reveal a superior outcome group. J Clin Oncol. 2020;38:3107–18. https://doi.org/10.1200/JCO.20.00461.
Tsuji J, Li T, Grinshpun A, Coorens T, Russo D, Anderson L, et al. Clinical efficacy and whole-exome sequencing of liquid biopsies in a phase IB/II study of bazedoxifene and palbociclib in advanced hormone receptor-positive Breast cancer. Clin Cancer Res. 2022;28:5066–78. https://doi.org/10.1158/1078-0432.CCR-22-2305.
Brown LC, Tucker MD, Sedhom R, Schwartz EB, Zhu J, Kao C, et al. LRP1B mutations are associated with favorable outcomes to immune checkpoint inhibitors across multiple cancer types. J Immunother Cancer. 2021;9:e001792. https://doi.org/10.1136/jitc-2020-001792.
Long J, Wang D, Yang X, Wang A, Lin Y, Zheng M, et al. Identification of NOTCH4 mutation as a response biomarker for immune checkpoint inhibitor therapy. BMC Med. 2021;19:154. https://doi.org/10.1186/s12916-021-02031-3.
Chida K, Kawazoe A, Kawazu M, Suzuki T, Nakamura Y, Nakatsura T, et al. A low Tumor mutational burden and PTEN mutations are predictors of a negative response to PD-1 blockade in MSI-H/dMMR gastrointestinal tumors. Clin Cancer Res. 2021;27:3714–24. https://doi.org/10.1158/1078-0432.CCR-21-0401.
Von Felden J, Craig AJ, Garcia-Lezana T, Labgaa I, Haber PK, D’Avola D, et al. Mutations in circulating Tumor DNA predict primary resistance to systemic therapies in advanced hepatocellular carcinoma. Oncogene. 2021;40:140–51. https://doi.org/10.1038/s41388-020-01519-1.
Bai X, Wu DH, Ma SC, Wang J, Tang XR, Kang S, et al. Development and validation of a genomic mutation signature to predict response to PD-1 inhibitors in non-squamous NSCLC: a multicohort study. J Immunother Cancer. 2020;8:e000381. https://doi.org/10.1136/jitc-2019-000381.
Ma SC, Bai X, Guo XJ, Liu L, Xiao LS, Lin Y, et al. Organ-specific metastatic landscape dissects PD-(L)1 blockade efficacy in advanced non-small cell Lung cancer: applicability from clinical trials to real-world practice. BMC Med. 2022;20:120. https://doi.org/10.1186/s12916-022-02315-2.
Ma SC, Tang XR, Long LL, Bai X, Zhou JG, Duan ZJ, et al. Integrative evaluation of primary and metastatic lesion spectrum to guide anti-PD-L1 therapy of non-small cell Lung cancer: results from two randomized studies. Oncoimmunology. 2021;10:1909296. https://doi.org/10.1080/2162402X.2021.1909296.
Marletta S, Fusco N, Munari E, Luchini C, Cimadamore A, Brunelli M, et al. Atlas of PD-L1 for pathologists: indications, scores, diagnostic platforms and reporting systems. J Pers Med. 2022;12:1073. https://doi.org/10.3390/jpm12071073.
Rimm DL, Han G, Taube JM, Yi ES, Bridge JA, Flieder DB, et al. A prospective, multi-institutional, pathologist-based assessment of 4 immunohistochemistry assays for PD-L1 expression in non-small cell Lung cancer. JAMA Oncol. 2017;3:1051–8. https://doi.org/10.1001/jamaoncol.2017.0013.
Ilie M, Long-Mira E, Bence C, Butori C, Lassalle S, Bouhlel L, et al. Comparative study of the PD-L1 status between surgically resected specimens and matched biopsies of NSCLC patients reveal major discordances: a potential issue for anti-PD-L1 therapeutic strategies. Ann Oncol. 2016;27:147–53. https://doi.org/10.1093/annonc/mdv489.
Jiang Y, Zhang Z, Yuan Q, Wang W, Wang H, Li T, et al. Predicting peritoneal recurrence and disease-free survival from CT images in gastric cancer with multitask deep learning: a retrospective study. Lancet Digit Health. 2022;4:e340–50. https://doi.org/10.1016/S2589-7500(22)00040-1.
Foersch S, Eckstein M, Wagner DC, Gach F, Woerl AC, Geiger J, et al. Deep learning for diagnosis and survival prediction in soft tissue sarcoma. Ann Oncol. 2021;32:1178–87. https://doi.org/10.1016/j.annonc.2021.06.007.
Poirion OB, Jing Z, Chaudhary K, Huang S, Garmire LX. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 2021;13:112. https://doi.org/10.1186/s13073-021-00930-x.
Chen J, Wang X, Ma A, Wang QE, Liu B, Li L, et al. Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data. Nat Commun. 2022;13:6494. https://doi.org/10.1038/s41467-022-34277-7.
Acknowledgements
None.
Funding
This work was supported by the National Nature Science Foundation of China [grant number: 82060327]; the Science and Technology Foundation of Guizhou Province [grant numbers: Qian ke he ji chu-ZK 2021 and yi ban 454]; the Qian Dong Nan Science and Technology Program [grant number: qdnkhJz [2023] 14]; and the National Nature Science Foundation of China [grant number: 82270225].
Author information
Authors and Affiliations
Contributions
Jie Peng, Lushan Xiao and Hongbo Zhu wrote the main manuscript text and prepared Figs. 1, 2, 3, 4, 5 and 6. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval and consent to participate
This study (2023-LUNSHEN-02) was approved by the institutional review board of the Second Affiliated Hospital of Guizhou Medical University and was performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all patients for tissue or blood use.
Consent for publication
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
12935_2023_3118_MOESM1_ESM.docx
Supplementary Material 1: Supplementary Table 1. Characteristics of the patients who did not receive immunotherapy. Supplementary Table 2. Characteristics of the patients who received immunotherapy. Supplementary Table 3. Selected mutational genes associated with prognosis in patients who did or did not receive immunotherapy. Supplementary Fig. 1. Training process for the deep learning survival model based on 45 somatic mutations for predicting overall survival in the MSK-MET cohort (training). KM, Kaplan–Meier. Supplementary Fig. 2. Training process for the deep learning survival model based on 27 somatic mutations for predicting progression-free survival in the MIND cohort. KM, Kaplan–Meier.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Peng, J., Xiao, L., Zhu, H. et al. Determining the prognosis of Lung cancer from mutated genes using a deep learning survival model: a large multi-center study. Cancer Cell Int 23, 262 (2023). https://doi.org/10.1186/s12935-023-03118-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12935-023-03118-y