Skip to main content

Determining the prognosis of Lung cancer from mutated genes using a deep learning survival model: a large multi-center study

Abstract

Background

Gene status has become the focus of prognosis prediction. Furthermore, deep learning has frequently been implemented in medical imaging to diagnose, prognosticate, and evaluate treatment responses in patients with cancer. However, few deep learning survival (DLS) models based on mutational genes that are directly associated with patient prognosis in terms of progression-free survival (PFS) or overall survival (OS) have been reported. Additionally, DLS models have not been applied to determine IO-related prognosis based on mutational genes. Herein, we developed a deep learning method to predict the prognosis of patients with lung cancer treated with or without immunotherapy (IO).

Methods

Samples from 6542 patients from different centers were subjected to genome sequencing. A DLS model based on multi-panels of somatic mutations was trained and validated to predict OS in patients treated without IO and PFS in patients treated with IO.

Results

In patients treated without IO, the DLS model (low vs. high DLS) was trained using the training MSK-MET cohort (HR = 0.241 [0.213–0.273], P < 0.001) and tested in the inter-validation MSK-MET cohort (HR = 0.175 [0.148–0.206], P < 0.001). The DLS model was then validated with the OncoSG, MSK-CSC, and TCGA-LUAD cohorts (HR = 0.420 [0.272–0.649], P < 0.001; HR = 0.550 [0.424–0.714], P < 0.001; HR = 0.215 [0.159–0.291], P < 0.001, respectively). Subsequently, it was fine-tuned and retrained in patients treated with IO. The DLS model (low vs. high DLS) could predict PFS and OS in the MIND, MSKCC, and POPLAR/OAK cohorts (P < 0.001, respectively). Compared with tumor-node-metastasis staging, the COX model, tumor mutational burden, and programmed death-ligand 1 expression, the DLS model had the highest C-index in patients treated with or without IO.

Conclusions

The DLS model based on mutational genes can robustly predict the prognosis of patients with lung cancer treated with or without IO.

Background

To optimize treatment regimens, predicting the prognosis of patients with lung cancer is vital. Accordingly, gene status has gradually become the focus of prognosis prediction. Based on high-throughput sequencing, multi-panels have been routinely evaluated in clinical treatment, revealing various candidate genes. For instance, the KRAS-G12C mutation is associated with poorer outcomes in surgically resected lung adenocarcinoma than wild-type KRAS [1]. Meanwhile, the SMARCA4 mutation is an independent predictive factor for poor prognosis in lung cancers, however, is also associated with immunotherapy (IO) sensitivity [2]. Additionally, mutations in EGFR, STK11, and B2M, or MDM2 amplification, are related to IO resistance or hyperprogressive disease [3,4,5], while TP53, KRAS, and POLE mutations are positively associated with a good response in advanced non-small cell lung cancer (NSCLC) [6,7,8,9].

Deep learning has frequently been implemented in medical imaging (including magnetic resonance imaging, computed tomography, and positron emission tomography) to diagnose, prognosticate, and evaluate treatment responses in patients with cancer [10,11,12]. Previous studies have used several genes or immune cell subtypes to develop models to predict IO or chemo-IO responses by machine learning. These studies achieved highly reliable and accurate results [13,14,15]. However, few deep learning survival (DLS) models based on mutational genes that are directly associated with patient prognosis in terms of progression-free survival (PFS) or overall survival (OS) have been reported, and their potential value remains unclear. Additionally, DLS models have not been applied to determine IO-related prognosis based on mutational genes.

The current study employed a DLS algorithm utilizing a panel of mutated genes to create a robust survival model to identify individuals with lung cancer and good prognosis in several large centers. Based on whole-genome sequencing (WGS), next-generation sequencing (NGS), and whole-exome sequencing (WES) databases, the DLS model was used to predict OS in patients with lung cancer who were treated without IO and to predict PFS in patients with lung cancer who were treated with IO. The predictive ability of the DLS model was compared with that of clinical tumor-node-metastasis (TNM) staging and the COX model. In addition, the ability of the DLS model to predict PFS in those who received IO was compared with that of the COX model, tumor mutational burden (TMB), and programmed death-ligand 1 (PD-L1) expression. A robust survival prediction model based on genomics panels will aid oncologists in implementing appropriate treatment strategies for patients with lung cancer.

Methods

Patients treated without IO

MSK-MET cohort

A total of 25,775 patients with metastatic cancers were included in the MSK-MET cohort [16]. However, 21,711 with other tumors were excluded, resulting in a final cohort comprising 4064 patients with lung cancer. Additionally, 271 patients had incomplete clinical or survival data and were thus excluded from this study. Ultimately, the data for 3793 patients with lung cancer were analyzed. The MSK-MET cohort was classified into training (n = 2504) and inter-validation (n = 1289) cohorts; all tumor samples were evaluated by NGS.

OncoSG cohort

The OncoSG cohort comprised 305 patients from East Asia countries. Eight patients lacking clinical or survival data were excluded [17]. Hence, 297 patients with lung adenocarcinoma were included in an independent validation cohort. All tumor samples were evaluated by WES.

MSK-CSC cohort

This cohort comprised 10,945 patients, of which, 9588 patients with other tumors were excluded [18]. Further, 417 patients without clinical or survival data were excluded. Thus, 940 patients with lung cancer comprised an independent validation cohort. All tumor samples were assessed by NGS.

TCGA-LUAD cohort

Among the 566 patients with lung adenocarcinoma, 52 were excluded due to a lack of clinical data (https://www.cell.com/pb-assets/consortium/pancanceratlas/pancani3/index.html). Moreover, 26 patients without complete survival data were excluded. Thus, 488 patients with lung adenocarcinoma comprised an independent validation cohort; all tumor samples were assessed by WGS.

Patients treated with IO

MIND cohort

A total of 247 patients with lung cancer from the Memorial Sloan Kettering Cancer Center (MSKCC) cohort were recruited [19]. All patients received anti-PD-1/PD-L1 treatment. One patient was excluded due to a lack of clinical data. Hence, 246 patients were included in this training cohort. All tumor samples were evaluated by NGS.

MSKCC cohort

A total of 349 patients from a clinical trial and retrospective analysis (NCT01454102, NCT01295827) who received anti-PD-1/PD-L1 monotherapy or combinatorial treatment with anti-CTLA4 were included [20]. These patients constituted another validation cohort. All tumor samples were analyzed by NGS.

POPLAR/OAK cohort

The POPLAR and OAK studies (NCT01903993, NCT02008227) recruited 1137 patients with advanced or metastatic NSCLC [21, 22]. Patients treated with docetaxel (n = 568) and those without blood TMB data (n = 140) were excluded. Ultimately, the POPLAR/OAK cohort comprised 429 patients as a validation cohort. All blood samples were tested by NGS.

This study (2023-LUNSHEN-02) was approved by the institutional review board of the Second Affiliated Hospital of Guizhou Medical University and was performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all patients for tissue or blood use.

Study design

Figure 1 illustrates the flowchart of proposed DLS models for predicting OS and PFS. In the MSK-MET (training) cohort, optimal mutated genes were identified by the least absolute shrinkage and selection operator (LASSO) algorithm based on five-fold cross-validation. The selected genes served as input for training the DLS models to predict OS. The training parameters were adjusted, and the DLS models were validated for OS in the MSK-MET (inter-validation), OncoSG, MSK-CSC, and TCGA-LUAD cohorts. The LASSO algorithm for predicting PFS was also used to select the mutated genes in the MIND cohort in patients treated with IO. The trained DLS model was fine-tuned and retrained in the MIND cohort and, subsequently, tested in the MSKCC and POPLAR/OAK cohorts. The COX models were analyzed in patients treated with and without IO. The performance of the DLS model, COX model, and TNM staging for predicting OS in patients treated without IO was compared (via the C-index). Furthermore, the performance of the DLS model, COX model, TMB, and PD-L1 expression level for predicting PFS was compared among patients treated with IO using the C-index.

Fig. 1
figure 1

Flowchart of the proposed deep learning survival (DLS) model to determine disease prognosis. The somatic mutational databases were derived from non-small cell lung cancer (NSCLC) samples. In the MSK-MET cohort (training), the selected genes were trained to predict overall survival (OS) using deep learning. After adjusting the training parameters, the DLS models were validated for OS in the MSK-MET (inter-validation), OncoSG, MSK-CSC, and TCGA-LUAD cohorts. The trained DLS model was fine-tuned and re-trained using the MIND cohort. The DLS model was validated in the MSKCC and POPLAR/OAK cohorts. The COX models were analyzed in all patients. The C-indices of the DLS model, COX model, and tumor-node-metastasis (TNM) staging were compared in patients treated without immunotherapy (IO) regarding OS. The C-indices of the DLS model, COX model, tumor mutational burden (TMB), and programmed death-ligand 1 (PD-L1) expression were also compared among patients treated with IO regarding progression-free survival (PFS).

TMB, PD-L1 expression analysis, and selection of optimal mutated genes

Based on WES, WGS, and NGS profiling, a TMB ≥ 10 mutations (muts)/Mb or a total number of somatic nonsynonymous mutations ≥ 200 was defined as a high TMB. The tumor cells were considered to have a high PD-L1 expression level when > 50% stained positive. All mutated genes were defined as “1” and wild-type genes were defined as “0.” The optimal mutated genes were selected via LASSO and five-fold cross-validation sampling (Fig. 2). The mutated genes were separately selected to predict OS in patients treated without IO and PFS in patients treated with IO. The selected genes served as input variables for the deep learning model.

Fig. 2
figure 2

Flowchart of the selection method and the deep neural network architecture. Least absolute shrinkage and selection operator (LASSO) based on five-fold cross-validation was used to select optimal genomics features. The selected genes were imported into the deep learning survival (DLS) model as eigenvectors. The DLS contains multiple hidden layers, weight-decay regularization, rectifying linear units, batch normalization, dropout, and stochastic gradient descent using Nesterov momentum, gradient pruning, and learning rate scheduling. The network output is a single node that estimates the weight of the risk function parameterized through the network. IO, immunotherapy

DLS model and implementation

As presented in Fig. 2, the DLS model is a multi-layer perceptron similar to the Faraggi–Simon network (https://github.com/jaredleekatzman/DeepSurv). However, the DLS model comprises multiple additional hidden layers as well as various new methods, including weight-decay regularization, batch normalization, rectifying linear units, dropout, Stochastic gradient descent using gradient pruning, learning rate scheduling, and Nesterov momentum. A single node served as an output of the network that estimated the weight of the risk function parameterized by the network. The loss function was set as a negative log-likelihood function represented by Eq. (1):

$$l\left(\theta \right)=-{\sum }_{i,Ei=1}({\widehat{h}}_{\theta }\left(x\right)-{{log}{\sum }_{j\in R\left(Ti\right)}e}^{{\widehat{h}}_{\theta }\left(xj\right)})$$
(1)

The selected genes were imported into the DLS model as vectors. The maximum number of epochs was set to 100 to ensure proper implementation of the training procedure. TensorFlow-1.14 in Python (https://www.python.org/) was utilized to implement deep learning. The experiment was conducted in Windows with the following configurations: 3.7 GHz Intel i7-12700KF CPU, NVIDIA GeForce RTX 3090, and 32 GB of RAM.

Statistical analysis

This study employed the LASSO algorithm, which utilized five-fold cross-validation, to select the optimal non-zero coefficients. A deep learning algorithm-based survival model was applied to predict OS in patients treated without IO and PFS in patients treated with IO. The DLS model’s performance was evaluated in the training and other validation cohorts. The optimal cutoff value for predicting OS or PFS was defined with the X-tile software (https://medicine.yale.edu/lab/rimm/research/Software/). The Kaplan–Meier approach was employed to analyze the PFS and OS curves, which were then plotted with the “survivminer” package. The COX model was based on selected genes using the “rms” package. The accuracies of different models were compared using the C-Index; higher C-indices indicated more accurate model predictive ability. The statistical analyses for this study were performed utilizing R version 3.5.1 (https://www.r-project.org/) and GraphPad Prism 7.01 (https://www.graphpad.com/). Statistical significance was set at P < 0.05.

Results

Characteristics of individuals treated without and with IO

The basic clinical characteristics of patients with NSCLC treated without IO in the MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD cohorts are shown in Supplementary Table 1. There were 2064 (54.42%), 150 (50.50%), 461 (49.05%), and 229 (46.93%) male patients in the MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD cohorts, respectively. In the MSK-MET, OncoSG, and TCGA-LUAD cohorts, 2060 (54.31%), 183 (61.62%), and 325 (66.60%) patients were aged > 60 years. Most patients (62.29%) were never smokers in the OncoSG cohort. Moreover, 817 (21.54%), 24 (8.08%), 218 (23.20%), and 173 (35.45%) patients, respectively, had a high TMB (≥ 200 or > 20 muts/Mb) and the TMB status was diverse in the different populations.

The basic clinical features of individuals with NSCLC treated with IO in the MIND, MSKCC, and POPLAR/OAK cohorts are presented in Supplementary Tables 2, with 112 (45.53%), 172 (49.28%), and 275 (78.80%) male patients, respectively. In the 3 cohorts, 190 (77.23%), 222 (67.15%), and 265 (75.93%) patients, respectively, were aged > 60 years. Most individuals in the MSKCC (80.51%) and POPLAR/OAK (80.51%) cohorts were current or ever smokers. Additionally, in the 3 cohorts, 15 (3.50%), 71 (20.34%), and 175 (27.22%) patients, respectively, had a high TMB (≥ 200 or > 20 muts/Mb) with diverse TMB status among the populations. In the MIND, MSKCC, and POPLAR/OAK cohorts, 119 (48.37%), 43 (12.32%), and 59 (12.33%) individuals, respectively, had positive PD-L1 expression (> 1%). In these 3 cohorts, 81 (32.93%), 218 (62.46%), and 295 (68.76%) patients, respectively, achieved durable clinical benefits.

Selection of mutational genes associated with prognosis in patients with and without IO

Based on the five-fold cross-validation, LASSO was applied to select the optimal mutational genomics from the MSK-MET cohort (training). In total, 45 somatic mutations were selected (Fig. 3a; Supplementary Table 3). High-mutational-frequency genes, such as TP53, EGFR, STK11, KRAS, and KEAP1, were selected in the MSK-MET cohort (training). Similarly, in the MIND cohort, 27 somatic mutations were identified in patients with lung cancer treated with IO (Fig. 3b). The Kyoto Encyclopedia of Genes and Genomes analysis revealed that the 45 mutational genes were associated with various cancer pathways, including hepatocellular carcinoma, head and neck squamous cell carcinoma, and breast cancer (false discovery rate [FDR]: P < 0.001; Fig. 3c). An association was observed between the 27 mutational genes for predicting PFS in the MIND cohort and immunology signaling pathways (FDR: P < 0.001; Fig. 3d), including the regulatory circuits of the STAT3 signaling pathway and cellular response to DNA damage stimuli. Subsequently, a panel of 45 mutational genes was employed to train the model in predicting OS in the MSK-MET cohort (training) treated without IO based on deep learning algorithms. The model was the retrained using a panel of 27 mutational genes to predict PFS in the MIND cohort treated with IO.

Fig. 3
figure 3

Least absolute shrinkage and selection operator (LASSO) selection of genes and pathway analysis. (a, b) Optimal somatic mutations selected in patients with non-small cell lung cancer who did or did not receive immunotherapy. (c, d) Enrichment analysis of somatic mutations and different signaling pathways

Training and testing the DLS model for OS in patients treated without IO

The DLS model was run using the TensorFlow 1.14 platform (https://tensorflow.google.cn/install/source). The MSK-MET cohort (training) was trained in 100 epoch processes, and the MSK-MET cohort (inter-validation) was used for validation (Supplementary Fig. 1). The OncoSG, MSK-CSC, and TCGA-LUAD cohorts were tested using the trained DLS model. According to the cutoff value (0.50) of DLS scores as the X-tile (https://en.freedownloadmanager.org/Windows-PC/X-tile-FREE.html), individuals with NSCLC treated without IO were stratified into high (> 0.50) and low (≤ 0.50) DLS groups. The high DLS group had a shorter median OS than the low DLS group (24.18 months vs. not reached [NR]; hazard ratio [HR] = 4.13 [3.66–4.67], P < 0.001; Fig. 4a) in the MSK-MET cohort (training) treated without IO (Fig. 4a). In the MSK-MET cohort (inter-validation), the high DLS group also had a shorter median OS than the low DLS group (19.68 months vs. NR; HR = 5.71 [4.85–6.72], P < 0.001; Fig. 4b). In the OncoSG and MSK-CSC cohorts, the high DLS group was validated and had a shorter median OS than the low DLS group (OncoSG: 59.00 months vs. NR; HR = 2.37 [1.54–3.67], P < 0.001; MSK-CSC: 25.40 months vs. NR; HR = 1.82 [1.40–2.35], P < 0.001, Fig. 4c, d). Likewise, in the TCGA-LUAD cohort, the high DLS group had a shorter median OS and PFS than the low DLS group (OS: 32.45 vs. 63.10 months; HR = 4.63 [3.43–6.25], P < 0.001; PFS: 22.49 vs. 51.55 months; HR = 2.08 [1.58–2.75], P < 0.001, Fig. 4e, f).

Fig. 4
figure 4

Development and validation of the deep learning survival (DLS) model for overall survival (OS). (a, b) DLS model comprising 45 selected genes in the training MSK-MET cohort was tested in the inter-validation MSK-MET cohort. (c, d) OS curves of patients from the OncoSG and MSK-CSC cohorts. (e, f) OS and progression-free survival curves of patients from the TCGA-LUAD cohort

DLS model fine-tuning and retraining for PFS in patients treated with IO

In determining the prognosis of patients receiving anti-PD-1 therapy, the DLS model was fine-tuned and retrained via 27 selected mutational genes. Individuals with NSCLC treated with IO were categorized into the high (> 0.50) and the low (≤ 0.50) DLS groups. The low DLS group had a longer median PFS than the high DLS group (12.80 vs. 2.00 months; HR = 3.41 [2.58–4.98], P < 0.001; Fig. 5a) in the MIND cohort treated with IO. In the MSKCC and POPLAR/OAK cohorts, the low DLS group exhibited better PFS than the high DLS group (both P < 0.001; Fig. 5b, c). The DLS model’s ability to predict OS in the MIND cohort was validated; the low DLS group had a considerably longer median OS duration (24.50 vs. 7.00 months; HR = 4.34 [3.11–6.06], P < 0.001) than that of the high DLS group (Fig. 5d). The low DLS group had better OS than that of the high DLS group in the MSKCC and POPLAR/OAK cohorts (both P < 0.001; Fig. 5e, f).

Fig. 5
figure 5

Development and validation of the deep learning survival (DLS) model for progression-free survival (PFS). (a–c) DLS model for predicting PFS comprising 27 selected genes was constructed in the MIND cohort and validated in the MSKCC and POPLAR/OAK cohorts. (d–f) Overall survival curves of patients in the MIND, MSKCC, and POPLAR/OAK cohorts

Comparison of the DLS model with clinical features and the COX model

In all 4 cohorts treated without IO, a routine model was developed using the COX method based on the selected panel of 45 mutational genes. The high COX group had a longer median OS than that of the low COX group (70.67 vs. 32.00 months; HR = 0.48 [0.44–0.53], P < 0.001; Fig. 6a). The C-index of the DLS model was significantly higher than that of the TNM stage or COX model (0.74 vs. 0.60 vs. 0.63). The low DLS group had a better OS than that of the TNM stage I–II groups (P < 0.010; Fig. 6b). In all three cohorts (MIND, MSKCC, and POPLAR/OAK) treated with IO, the low COX group had a longer median PFS than the high COX group (6.34 vs. 2.37 months; HR = 0.53 [0.47–0.61], P < 0.001; Fig. 6c). The C-index of the DLS model was significantly higher than that of the COX model (0.70 vs. 0.61). The low DLS group had a better PFS than that of the high PD-L1 group (P < 0.001; Fig. 6d) and high TMB group (P < 0.001; Fig. 6d). The C-index of the DLS model was significantly higher than that of the PD-L1 and TMB groups (0.70 vs. 0.55 vs. 0.54).

Fig. 6
figure 6

Deep learning survival (DLS) compared with the COX model and other clinical predictive methods. (a, b) Comparison of the DLS model with the COX model and clinical staging in all cohorts treated without IO in predicting OS. (c, d) Comparison of the DLS model with the COX model, tumor mutational burden (TMB), and programmed death-ligand 1 (PD-L1) expression in all cohorts with IO in predicting PFS.

Discussion

In this study, deep learning methods were employed using multi-center sequencing data to develop predictive models for OS or PFS in individuals with NSCLC from several cohorts treated with or without IO. As per our knowledge, this is the largest study to determine prognosis based on sequencing data from patients with NSCLC. Moreover, to prevent over-fitting of the DLS model, the LASSO algorithm was initially utilized to select optimal genes. Ultimately, 45 somatic mutations were selected to predict OS in patients treated without IO. The DLS model was validated in the MSK-MET (inter-validation), OncoSG, MSK-CSC, and TCGA-LUAD cohorts. After fine-tuning and retraining the parameters, a DLS model based on 27 somatic mutations was applied to predict PFS in the MIND cohort treated with IO. The DLS model was also validated in the MSKCC and POPLAR/OAK cohorts. Further, the COX model and TNM staging were compared with the DLS model in all cohorts treated without IO, revealing that the DLS model had the highest C-index. The DLS model also exhibited superior predictive performance compared to the TMB, PD-L1 expression, and COX models in all cohorts.

Although the WGS, NGS, and WES databases have been used increasingly and extensively in cancer research, most studies have focused on several gene panels or sole driver mutational genes. Consequently, the large amount of sequencing data available is not being efficiently utilized, particularly for somatic mutations [23,24,25,26,27,28,29,30]. In contrast, the current study focused primarily on employing a relatively small panel of mutational genes to develop a robust predictive model for disease prognosis. To the best of our knowledge, this is the first study to use deep learning to train somatic mutations for predicting OS in patients treated without IO or routine images. Importantly, different sequencing methods did not affect predictive ability. However, additional research is needed to investigate whether DLS can classify OS prediction utilizing a large amount of data obtained from WES, NGS, or WGS without relying on simple somatic mutations. The genomic sequencing data analyzed in this study were obtained from tumor DNA. Moreover, the training model was validated with data from the other four cohorts (MSK-MET, OncoSG, MSK-CSC, and TCGA-LUAD), all of which underwent tumor tissue sequencing. Based on these results, it can be concluded that the DLS model is a feasible and robust method for accurately predicting the OS of patients with NSCLC. Moreover, the DLS model could predict PFS in the TCGA-LUAD cohort undergoing surgery, indicating that this model can be applied to predict recurrence time via sequencing data.

Several machine-learning models have been used to predict PFS and OS in patients who received IO [31,32,33]. However, herein, a deep learning algorithm based on somatic mutations was used for the first time to directly predict PFS. In this study, patients with low DLS had significantly better PFS and OS than did those with high DLS in the MIND, MSKCC, and POPLAR/OAK cohorts. These findings imply that the DLS model could efficiently evaluate clinical prognosis in patients with NSCLC treated with or without IO. In contrast, TMB and PD-L1 expression exhibited unsatisfactory outcomes in predicting PFS and OS in the three cohorts. It is hypothesized that using various detection platforms or different cutoff values for TMB might have led to an uncertain predictive impact. Indeed, the PD-L1 assay may have employed diverse reagents from several manufacturers [34, 35], and the expression levels of PD-L1 from different tumor regions may have differed [36]. Hence, the DLS model is a viable tool that can overcome the drawbacks of TMB or PD-L1 expression levels to predict clinical outcomes in patients with NSCLC treated with IO.

Employing deep learning to predict disease prognosis, involving medical images or clinical features, has gradually been introduced in cancer research [37,38,39]. However, acquiring a large database of clinical features to train models is difficult, especially regarding genomic mutations and patients with cancer who receive IO. Transfer learning is a promising strategy for addressing the issue of small sample sizes [40]. The current study used transfer learning to train the DLS model with similar predictive objectives. The DLS model for predicting OS in patients treated without IO was first trained using larger sequencing data after selecting optimal somatic mutations, avoiding overfitting during training. Although the deep learning method had more parameters and complexities, it also had a higher and more consistent ability to predict OS than the COX model (C-index: 0.74 vs. 0.63). Moreover, deep learning based on genomic mutations could better reflect the prognostic status than simple clinical staging. This indicates that analysis of sequencing mutation information would greatly improve the development of molecular typing in lung cancer. Nevertheless, large-scale sequencing data is difficult to acquire, particularly for patients receiving IO or chemotherapy plus IO. In our study, after the DLS model was trained in patients who did not receive IO, it was retrained using a smaller dataset (MIND cohort), indicative of transfer learning. This method could allow for training with smaller-scale mutational data in other cancers while maintaining model stability. The DLS model also presented higher predictive ability than that of the COX model in patients who received IO (C-index: 0.70 vs. 0.61). Therefore, this novel deep-learning algorithm has the capacity to increase the identified associations between prognosis and gene status greatly.

This study has few limitations. First, although the study included many patients from numerous centers, several clinical variables (e.g., PFS and tumor biomarkers) were missing in the MSK-MET, OncoSG, and MSK-CSC cohorts. Therefore, the DLS model could not incorporate these clinical variables to optimize predictive performance further. Additionally, although a panel of selected somatic mutations based on WES, WGS, or NGS data was employed, copy number variation, mRNA expression, radiomics, and pathology grade were not utilized to predict OS and PFS. A deep learning method based on a multi-omics model could be evaluated. Furthermore, circulating tumor DNA analysis of peripheral blood samples is a noninvasive approach only conducted in the POPLAR/OAK cohort. Hence, the predictive performance of the DLS model for prognosis based on circulating tumor DNA could be further investigated.

Conclusions

Herein, deep learning based on a panel of mutational genes served as a novel and reliable algorithm for determining the prognosis in patients with NSCLC who did or did not receive IO. The DLS model can predict OS and PFS better than the COX model, TNM staging, TMB, or PD-L1 expression. Our findings provide new insights for predicting clinical outcomes in patients with NSCLC based on the WGS, NGS, and WES databases. This new deep learning algorithm from high-throughput sequencing can be exploited to inform pan-cancer clinical decisions.

Data Availability

The data supporting the findings of this study are available upon request from the corresponding author.

Abbreviations

HR:

Hazard ratio

IO:

Immunotherapy

MSKCC:

Memorial Sloan Kettering Cancer Center

NGS:

Next-generation sequencing

NSCLC:

Non-small cell lung cancer

NR:

Not reached

OS:

Overall survival

PD-L1:

Programmed death-ligand 1

PFS:

Progression-free survival

TMB:

Tumor mutational burden

TNM:

Tumor-node-metastasis

WES:

Whole-exome sequencing

WGS:

Whole-genome sequencing

References

  1. Nadal E, Chen G, Prensner JR, Shiratsuchi H, Sam C, Zhao L, et al. KRAS-G12C mutation is associated with poor outcome in surgically resected lung adenocarcinoma. J Thorac Oncol. 2014;9:1513–22. https://doi.org/10.1097/JTO.0000000000000305.

    Article  CAS  PubMed  Google Scholar 

  2. Schoenfeld AJ, Bandlamudi C, Lavery JA, Montecalvo J, Namakydoust A, Rizvi H, et al. The genomic landscape of SMARCA4 alterations and associations with outcomes in patients with Lung cancer. Clin Cancer Res. 2020;26:5701–8. https://doi.org/10.1158/1078-0432.CCR-20-1825.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ricciuti B, Arbour KC, Lin JJ, Vajdi A, Vokes N, Hong L, et al. Diminished efficacy of programmed death-(ligand)1 inhibition in STK11- and KEAP1-mutant lung adenocarcinoma is affected by KRAS mutation status. J Thorac Oncol. 2022;17:399–410. https://doi.org/10.1016/j.jtho.2021.10.013.

    Article  CAS  PubMed  Google Scholar 

  4. Kato S, Goodman A, Walavalkar V, Barkauskas DA, Sharabi A, Kurzrock R. Hyperprogressors after immunotherapy: analysis of genomic alterations associated with accelerated growth rate. Clin Cancer Res. 2017;23:4242–50. https://doi.org/10.1158/1078-0432.CCR-16-3133.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Rizvi H, Sanchez-Vega F, La K, Chatila W, Jonsson P, Halpenny D, et al. Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell Lung cancer profiled with targeted next-generation sequencing. J Clin Oncol. 2018;36:633–41. https://doi.org/10.1200/JCO.2017.75.3384.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Biton J, Mansuet-Lupo A, Pécuchet N, Alifano M, Ouakrim H, Arrondeau J, et al. TP53, STK11, and EGFR mutations predict Tumor immune profile and the response to anti-PD-1 in lung adenocarcinoma. Clin Cancer Res. 2018;24:5710–23. https://doi.org/10.1158/1078-0432.CCR-18-0163.

    Article  CAS  PubMed  Google Scholar 

  7. Vauchier C, Pluvy J, Theou-Anton N, Soussi G, Poté N, Brosseau S, et al. Poor performance status patient with long-lasting major response to pembrolizumab in advanced non-small-cell Lung cancer with coexisting POLE mutation and deficient mismatch repair pathway. Lung Cancer. 2021;160:28–31. https://doi.org/10.1016/j.lungcan.2021.07.016.

    Article  CAS  PubMed  Google Scholar 

  8. Skoulidis F, Goldberg ME, Greenawalt DM, Hellmann MD, Awad MM, Gainor JF, et al. STK11/LKB1 mutations and PD-1 inhibitor resistance in KRAS-mutant lung adenocarcinoma. Cancer Discov. 2018;8:822–35. https://doi.org/10.1158/2159-8290.CD-18-0099.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Dong ZY, Zhong WZ, Zhang XC, Su J, Xie Z, Liu SY, et al. Potential predictive value of TP53 and KRAS mutation status for response to PD-1 blockade immunotherapy in lung adenocarcinoma. Clin Cancer Res. 2017;23:3012–24. https://doi.org/10.1158/1078-0432.CCR-16-2554.

    Article  CAS  PubMed  Google Scholar 

  10. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying medical diagnoses and treatable Diseases by image-based deep learning. Cell. 2018;172:1122–31e9. https://doi.org/10.1016/j.cell.2018.02.010.

    Article  CAS  PubMed  Google Scholar 

  11. Peng J, Kang S, Ning Z, Deng H, Shen J, Xu Y, et al. Residual convolutional neural network for predicting response of transarterial chemoembolization in hepatocellular carcinoma from CT imaging. Eur Radiol. 2020;30:413–24. https://doi.org/10.1007/s00330-019-06318-1.

    Article  PubMed  Google Scholar 

  12. Peng J, Huang J, Huang G, Zhang J. Predicting the initial treatment response to transarterial chemoembolization in intermediate-stage hepatocellular carcinoma by the integration of radiomics and deep learning. Front Oncol. 2021;11:730282. https://doi.org/10.3389/fonc.2021.730282.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Peng J, Xiao L, Zou D, Han L. A somatic mutation signature predicts the best overall response to anti-programmed cell death protein-1 treatment in epidermal growth factor receptor/anaplastic Lymphoma kinase-negative non-squamous non-small cell Lung cancer. Front Med (Lausanne). 2022;9:808378. https://doi.org/10.3389/fmed.2022.808378.

    Article  PubMed  Google Scholar 

  14. Peng J, Zou D, Gong W, Kang S, Han L. Deep neural network classification based on somatic mutations potentially predicts clinical benefit of immune checkpoint blockade in lung adenocarcinoma. Oncoimmunology. 2020;9:1734156. https://doi.org/10.1080/2162402X.2020.1734156.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Peng J, Zou D, Han L, Yin Z, Hu X. A support vector machine based on liquid immune profiling predicts major pathological response to chemotherapy plus anti-PD-1/PD-L1 as a neoadjuvant treatment for patients with resectable non-small cell Lung cancer. Front Immunol. 2021;12:778276. https://doi.org/10.3389/fimmu.2021.778276.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Nguyen B, Fong C, Luthra A, Smith SA, DiNatale RG, Nandakumar S, et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell. 2022;185:563–75e11. https://doi.org/10.1016/j.cell.2022.01.003.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chen J, Yang H, Teo ASM, Amer LB, Sherbaf FG, Tan CQ, et al. Genomic landscape of lung adenocarcinoma in East asians. Nat Genet. 2020;52:177–86. https://doi.org/10.1038/s41588-019-0569-6.

    Article  CAS  PubMed  Google Scholar 

  18. Zehir A, Benayed R, Shah RH, Syed A, Middha S, Kim HR, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med. 2017;23:703–13. https://doi.org/10.1038/nm.4333.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Vanguri RS, Luo J, Aukerman AT, Egger JV, Fong CJ, Horvat N, et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell Lung cancer. Nat Cancer. 2022;3:1151–64. https://doi.org/10.1038/s43018-022-00416-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Peng J, Zhang J, Zou D, Xiao L, Ma H, Zhang X, et al. Deep learning to estimate durable clinical benefit and prognosis from patients with non-small cell Lung cancer treated with PD-1/PD-L1 blockade. Front Immunol. 2022;13:960459. https://doi.org/10.3389/fimmu.2022.960459.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, et al. Atezolizumab versus Docetaxel for patients with previously treated non-small-cell Lung cancer (POPLAR): a multicentre, open-label, phase 2 randomised controlled trial. Lancet. 2016;387:1837–46. https://doi.org/10.1016/S0140-6736(16)00587-0.

    Article  CAS  PubMed  Google Scholar 

  22. Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, et al. Atezolizumab versus Docetaxel in patients with previously treated non-smallcell Lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Lancet. 2017;389:255–65. https://doi.org/10.1016/S0140-6736(16)32517-X.

    Article  PubMed  Google Scholar 

  23. Tsang ES, Grisdale CJ, Pleasance E, Topham JT, Mungall K, Reisle C, et al. Uncovering clinically relevant gene fusions with integrated genomic and transcriptomic profiling of metastatic cancers. Clin Cancer Res. 2021;27:522–31. https://doi.org/10.1158/1078-0432.CCR-20-1900.

    Article  CAS  PubMed  Google Scholar 

  24. Harding JJ, Nandakumar S, Armenia J, Khalil DN, Albano M, Ly M, et al. Prospective genotyping of hepatocellular carcinoma: clinical implications of next-generation sequencing for matching patients to targeted and immune therapies. Clin Cancer Res. 2019;25:2116–26. https://doi.org/10.1158/1078-0432.CCR-18-2293.

    Article  CAS  PubMed  Google Scholar 

  25. Samur MK, Aktas Samur A, Fulciniti M, Szalat R, Han T, Shammas M, et al. Genome-wide somatic alterations in Multiple Myeloma reveal a superior outcome group. J Clin Oncol. 2020;38:3107–18. https://doi.org/10.1200/JCO.20.00461.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tsuji J, Li T, Grinshpun A, Coorens T, Russo D, Anderson L, et al. Clinical efficacy and whole-exome sequencing of liquid biopsies in a phase IB/II study of bazedoxifene and palbociclib in advanced hormone receptor-positive Breast cancer. Clin Cancer Res. 2022;28:5066–78. https://doi.org/10.1158/1078-0432.CCR-22-2305.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Brown LC, Tucker MD, Sedhom R, Schwartz EB, Zhu J, Kao C, et al. LRP1B mutations are associated with favorable outcomes to immune checkpoint inhibitors across multiple cancer types. J Immunother Cancer. 2021;9:e001792. https://doi.org/10.1136/jitc-2020-001792.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Long J, Wang D, Yang X, Wang A, Lin Y, Zheng M, et al. Identification of NOTCH4 mutation as a response biomarker for immune checkpoint inhibitor therapy. BMC Med. 2021;19:154. https://doi.org/10.1186/s12916-021-02031-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chida K, Kawazoe A, Kawazu M, Suzuki T, Nakamura Y, Nakatsura T, et al. A low Tumor mutational burden and PTEN mutations are predictors of a negative response to PD-1 blockade in MSI-H/dMMR gastrointestinal tumors. Clin Cancer Res. 2021;27:3714–24. https://doi.org/10.1158/1078-0432.CCR-21-0401.

    Article  CAS  PubMed  Google Scholar 

  30. Von Felden J, Craig AJ, Garcia-Lezana T, Labgaa I, Haber PK, D’Avola D, et al. Mutations in circulating Tumor DNA predict primary resistance to systemic therapies in advanced hepatocellular carcinoma. Oncogene. 2021;40:140–51. https://doi.org/10.1038/s41388-020-01519-1.

    Article  CAS  Google Scholar 

  31. Bai X, Wu DH, Ma SC, Wang J, Tang XR, Kang S, et al. Development and validation of a genomic mutation signature to predict response to PD-1 inhibitors in non-squamous NSCLC: a multicohort study. J Immunother Cancer. 2020;8:e000381. https://doi.org/10.1136/jitc-2019-000381.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Ma SC, Bai X, Guo XJ, Liu L, Xiao LS, Lin Y, et al. Organ-specific metastatic landscape dissects PD-(L)1 blockade efficacy in advanced non-small cell Lung cancer: applicability from clinical trials to real-world practice. BMC Med. 2022;20:120. https://doi.org/10.1186/s12916-022-02315-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ma SC, Tang XR, Long LL, Bai X, Zhou JG, Duan ZJ, et al. Integrative evaluation of primary and metastatic lesion spectrum to guide anti-PD-L1 therapy of non-small cell Lung cancer: results from two randomized studies. Oncoimmunology. 2021;10:1909296. https://doi.org/10.1080/2162402X.2021.1909296.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Marletta S, Fusco N, Munari E, Luchini C, Cimadamore A, Brunelli M, et al. Atlas of PD-L1 for pathologists: indications, scores, diagnostic platforms and reporting systems. J Pers Med. 2022;12:1073. https://doi.org/10.3390/jpm12071073.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Rimm DL, Han G, Taube JM, Yi ES, Bridge JA, Flieder DB, et al. A prospective, multi-institutional, pathologist-based assessment of 4 immunohistochemistry assays for PD-L1 expression in non-small cell Lung cancer. JAMA Oncol. 2017;3:1051–8. https://doi.org/10.1001/jamaoncol.2017.0013.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Ilie M, Long-Mira E, Bence C, Butori C, Lassalle S, Bouhlel L, et al. Comparative study of the PD-L1 status between surgically resected specimens and matched biopsies of NSCLC patients reveal major discordances: a potential issue for anti-PD-L1 therapeutic strategies. Ann Oncol. 2016;27:147–53. https://doi.org/10.1093/annonc/mdv489.

    Article  CAS  PubMed  Google Scholar 

  37. Jiang Y, Zhang Z, Yuan Q, Wang W, Wang H, Li T, et al. Predicting peritoneal recurrence and disease-free survival from CT images in gastric cancer with multitask deep learning: a retrospective study. Lancet Digit Health. 2022;4:e340–50. https://doi.org/10.1016/S2589-7500(22)00040-1.

    Article  CAS  PubMed  Google Scholar 

  38. Foersch S, Eckstein M, Wagner DC, Gach F, Woerl AC, Geiger J, et al. Deep learning for diagnosis and survival prediction in soft tissue sarcoma. Ann Oncol. 2021;32:1178–87. https://doi.org/10.1016/j.annonc.2021.06.007.

    Article  CAS  PubMed  Google Scholar 

  39. Poirion OB, Jing Z, Chaudhary K, Huang S, Garmire LX. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 2021;13:112. https://doi.org/10.1186/s13073-021-00930-x.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Chen J, Wang X, Ma A, Wang QE, Liu B, Li L, et al. Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data. Nat Commun. 2022;13:6494. https://doi.org/10.1038/s41467-022-34277-7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

None.

Funding

This work was supported by the National Nature Science Foundation of China [grant number: 82060327]; the Science and Technology Foundation of Guizhou Province [grant numbers: Qian ke he ji chu-ZK 2021 and yi ban 454]; the Qian Dong Nan Science and Technology Program [grant number: qdnkhJz [2023] 14]; and the National Nature Science Foundation of China [grant number: 82270225].

Author information

Authors and Affiliations

Authors

Contributions

Jie Peng, Lushan Xiao and Hongbo Zhu wrote the main manuscript text and prepared Figs. 1, 2, 3, 4, 5 and 6. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jie Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval and consent to participate

This study (2023-LUNSHEN-02) was approved by the institutional review board of the Second Affiliated Hospital of Guizhou Medical University and was performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all patients for tissue or blood use.

Consent for publication

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12935_2023_3118_MOESM1_ESM.docx

Supplementary Material 1: Supplementary Table 1. Characteristics of the patients who did not receive immunotherapy. Supplementary Table 2. Characteristics of the patients who received immunotherapy. Supplementary Table 3. Selected mutational genes associated with prognosis in patients who did or did not receive immunotherapy. Supplementary Fig. 1. Training process for the deep learning survival model based on 45 somatic mutations for predicting overall survival in the MSK-MET cohort (training). KM, Kaplan–Meier. Supplementary Fig. 2. Training process for the deep learning survival model based on 27 somatic mutations for predicting progression-free survival in the MIND cohort. KM, Kaplan–Meier.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, J., Xiao, L., Zhu, H. et al. Determining the prognosis of Lung cancer from mutated genes using a deep learning survival model: a large multi-center study. Cancer Cell Int 23, 262 (2023). https://doi.org/10.1186/s12935-023-03118-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12935-023-03118-y

Keywords