Prognostic value of a five-lncRNA signature in esophageal squamous cell carcinoma

The aim of this study was to identify prognostic long non-coding RNAs (lncRNAs) and develop a multi-lncRNA signature for suvival prediction in esophageal squamous cell carcinoma (ESCC). The clinical and gene expression data from Gene Expression Omnibus database (GSE53624, n = 119) were obtianed as training set. A total of 98 paired ESCC tumor and normal tissues were detected by RNA sequencing and used as test set. Another 84 ESCC tissues were used for real-time quantitative PCR(qRT-PCR) and as an independent validation cohort. Survival analysis, Cox regression and Kaplan–Meier analysis were performed. We screened a prognostic marker of ESCC from the GSE53624 dataset and named it as the five-lncRNA signature including AC007179.1, MORF4L2-AS1, RP11-488I20.9, RP13-30A9.2, RP4-735C1.6, which could classify patients into high- and low-risk groups with significantly different survival(median survival: 1.75 years vs. 4.01 years, log rank P < 0.05). Then test dataset and validation dataset confirmed that the five-lncRNA signature can determine the prognosis of ESCC patients. Predictive independence of the prognostic marker was proved by multivariable Cox regression analyses in the three datasets (P < 0.05). In addition, the signature was found to be better than TNM stage in terms of prognosis. The five-lncRNA signature could be a good prognostic biomarker for ESCC patients and has important clinical value.


Background
Esophageal carcinoma is a malignant tumor of esophageal mucosa epithelium or gland. Globally, esophageal cancer ranks seventh in morbidity and sixth mortality among all cancers [1]. China is one of the countries with the highest incidence of esophageal cancer, accounting for approximately half of all esophageal cancer cases [2]. There were 477,900 new cases diagnosed in China in 2015 [3]. Among these Chinese esophageal cancer patients, esophageal squamous cell carcinoma (ESCC) patients account for nearly 90% [2]. In terms of mortality, esophageal cancer deaths in China account for half of global esophageal cancer deaths [4]. Therefore, ESCC is a major public health challenge in the world, especially in China. Using Chinese ESCC patients as research samples is of great significance in revealing the pathogenesis and prognostic factors of ESCC.
ESCC is an extremely aggressive malignancy. Along with the clinical symptoms of early esophageal squamous cell carcinoma are not obvious and can be easily ignored by patients, most ESCC patients are diagnosed in advanced stages. As a result, the prognosis of ESCC patients is extremely poor. The 5 year survival rate of ESCC patients in some less developed areas is less than 10% [5]. In the United States, where medical conditions are advanced, the 5 year survival rate is 18% [6]. In China, the 5 year relative survival of postoperative ESCC patients is 8%-41% [7,8]. Despite recent advances in diagnostic and therapeutic approaches, the life expectancy of ESCC patients has not improved significantly [3,9,10]. In addition, the current global diagnosis of ESCC depends on endoscopy and biopsy [11]. After confirming the diagnosis, TNM stage is used clinically to make prognostic judgment and guide treatment, but the shortcoming of TNM stage is that it cannot achieve individualized prediction and accurate evaluation [12,13]. Therefore, identification of a sensitive and effective prognostic marker of ESCC is in urgent need to evaluate disease progression and patient's overall outcome.
Next generation sequencing (NGS) and bioinformatics analysis tools provides great help for human-beings to understand tumors [14][15][16][17]. In the process of exploring tumor pathogenesis and prognostic factors through NGS and bioinformatics analysis, long non-coding RNAs (lncRNAs), defined as transcripts longer than 200 bp, have been discovered to play important roles. For example, a HOX transcript antisense RNA, lncRNA HOTAIR, was found to correlate with poor prognosis of lung cancer and promote tumor progression [18]. LncRNA REG1CP was shown to promote the development and progression of tumor [19]. For ESCC, several lncRNAs have been detected to associate with short survival, such as lncRNA H19 [20], lncRNA CASC9 [21] and lncRNA LUCAT1 [22]. Subsequently, prognostic multi-lncRNA signatures with high potential clinical application significance were constructed based on lncRNA expression profile data from public database such as Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) database. For instance, a three-lncRNA signature (ENST00000435885.1, XLOC_013014, ENST00000547963.1) was found that it can classify the ESCC patients into two groups with significantly different overall survival [23]. A seven-lncRNA (RP5-1172N10.2, RP11-579D7.4, RP11-89N17.4, LA16c-325D7.2, RP1-251M9.2, RP11-259O2.2, LINC00173) signature can predict overall survival of ESCC patients [24]. Although prognostic lncRNA signatures have been identified for ESCC, there are few studies that can verify the effectiveness of the predictive model in independent experimental data set and confirm the value of the prognostic lncRNA in ESCC tissues.
Here, we used our lncRNA expression data tested by RNA sequencing and followed up the clinical 5 year survival information of 98 ESCC patients, then, combined it with 119 ESCC public expression profiles from GEO and another independent 84 ESCC tissues for qPCR validation to construct a clinically valuable lncRNA signature that can accurately predict survival of ESCC patients.

Sample collection and preparation
LncRNA expression profile and corresponding clinical data of 119 ESCC cases (GSE53624) were obtained from the publicly available GEO database (https ://www.ncbi. nlm.nih.gov/geo/). As Anyang is a high-risk area of ESCC in China, we collected 98 postoperative ESCC tissues and paired non-tumor tissues from Anyang Tumor Hospital during 2014-2015, and organized relevant patient clinical information. Then we examined the protein coding gene (PCG) and lncRNA expression profile of the 98 pair ESCC tissues by next-generation sequencing (NGS, Hereinafter referred to as RNA-seq dataset). In addition, we collected an independent validation cohort including 84 postoperative ESCC patients from the same hospital and detected their lncRNA expression level using the qRT-PCR. Detailed clinicopathological characteristics of all these ESCC patients of the ESCCs in this study was shown in Table 1. Tumor-node-metastasis (TNM) classification of the International Union against Cancer, 7th edition was used to categorize. Documentation of informed consent was obtained through the institutional review board. The study was approved by the Anyang Tumor Hospital Ethical Committee.

RNA isolation and next generation RNA sequencing analysis
After TRIZOL lysis and purification, total RNA was isolated by the miRNeasy Mini Kit (QIAGEN) with DNase digestion step. A total amount of 5 ug RNA per sample was used as input material for the RNA sample preparation. Firstly, ribosomal RNA was removed Relative quantification of lncRNA expression was normalized by the 2 −ΔCt method, and GAPDH was used for normalization with the corresponding primers (Additional file 1. Table S1). All reactions were carried out in triplicate by StepOnePlus ™ Real-Time PCR System (Applied Biosystems) [25][26][27].

Construction of multi-lncRNA prognostic signature
GSE53624 and RNA-seq datasets were analyzed respectively by Cox analysis and Kaplan-Meier analysis to identify lncRNAs associated with overall survival (OS) of ESCC patients. We selected the prognostic lncRNAs using Cox p & log rank P < 0.05 in 2 datasets. Model was estimated as follows [25,26,28]: where N is prognostic lncRNA number, ExpLncRNA i represents the lncRNA expression value, and CoefCox i is the Cox regression coefficient of lncRNA. We plotted ROC curves [16] and calculated their area under the curve (AUC) values to screen out the prognostic signature with the largest AUC value in the GSE53624 set [28]. The whole construction process was shown in Additional file 2. Figure S1. R program (Version 3.5.1) was used to perform the above analyses, including packages called pROC, TimeROC and survival from Bioconductor (https ://www.bioco nduct or.org/). Subpath-wayMiner was used to find the selected prognostic genes related pathways (https ://cran.r-proje ct.org/web/packa ges/Subpa thway Miner /), which provides more flexibility in annotating gene sets and identifying the involved pathways (entire pathways and sub-pathways) [29].

Screening prognostic lncRNAs in ESCC
There were a total of 217 samples with lncRNA expression profiles including GSE53624 (n = 119) and RNA-seq (n = 98) dataset in this study. From the above two datasets, we found a total of 6253 lncRNAs were expressed in ESCC tissues. We then performed univariate cox analysis and Kaplan-Meier analysis to analyze the relationship between the lncRNA expression and ESCC OS. Based on the two ESCC profiling datasets and the corresponding clinical follow-up information, univariate cox analyses identified 368 lncRNAs in the GSE53624 and 290 lncRNAs in the RNA-seq dataset which were significantly associated with ESCC OS (Cox P < 0.05) and Kaplan-Meier analyses identified 386 survival associated lncRNAs in the GSE53624 and 312 survival associated lncRNAs in the RNA-seq dataset (log rank P < 0.05). Through comparing the above four groups of survival related lncRNA data, we found 19 identical lncRNAs significantly correlated with OS in two datasets (COX P < 0.05, log rank P < 0.05) (Fig. 1a). Subsequently, we found that 10 of the 19 prognostic-related lncRNAs showed a consistent risk trend in the four groups, displaying a risk or protective role in ESCC (Additional file 3. Table S2).

Constructing prognostic multi-lncRNA signatures
We used 119 ESCC data from public GSE53624 as the training group, 98 RNA-seq data as test group and 84 qPCR data as validation group. Using above selected 10 prognostic lncRNAs, we constructed 2 10 -1 = 1023 signatures in the training dataset. In order to obtain a signature with the strongest prognostic ability, we performed ROC analyses in the training dataset (Additional file 4. Table S3). We screened the 5-lncRNA combination with the largest AUC value, namely the prognostic signature containing AC007179.1, MORF4L2-AS1, RP11-488I20.9, RP13-30A9.2, RP4-735C1.6 ( Fig. 1b, Table 2). The lncRNA risk score was calculated as follows: As shown in Fig. 1c, the AUC of the screened lncRNA signature was 0.71.

Predictive power of the lncRNA signature for patients with ESCC
After calculated risk scores of ESCC patients in the GSE53624 dataset, the median risk classified ESCC patients into high-and low-risk groups (n = 59/60). Kaplan-Meier analysis found the survival of ESCC patients in the low-risk group was significantly longer than those in the high-risk group (median survival: 1.75 years vs. 4.01 years, log-rank test P < 0.001; Fig. 2a).
The 3 year survival rate for patients in the high-risk group was only 23.73%, while that of patients in the lowrisk group reached as high as 66.67%. Then we evaluated the predictive ability of the five-lncRNA signature in ESCC RNA-seq dataset (n = 98), which was treated as the test set. The median risk score of patients in the test dataset was calculated and used to divide patients into high-and low-risk groups. Figure 2b showed the Kaplan-Meier analysis result. The prognosis of ESCC patients in the low-risk group was significantly better than patients in the high-risk group (Median survival: 2.44 years vs. 3.97 years, log-rank test P = 0.014; Fig. 2b). The 3 year survival rate for the patients of highrisk group was only 40.82%, while that of the low-risk group was 61.22%.
In addition, we performed qPCR experiment and obtained the five lncRNAs expression level in 84 ESCC tissues. Through calculated risk scores and used the median risk score as the cutoff value, we obtained the high score group and low score group. Kaplan-Meier analysis discovered the survival difference between two risk groups. The survival of ESCC patients with lowrisk scores was still significantly greater than patients with high-risk scores. (Median survival: 2.82 years vs. 4.08 years, log-rank test P = 0.011; Fig. 2c). Figure 3 displayed the survival time, risk score and lncRNA expression level of each patient in above two  ESCC lncRNA expression profile datasets, including GSE53624 and RNA-seq datasets. As shown, patients with the higher expression of AC007179.1, MORF4L2-AS1 and RP4-735C1.6, the higher the risk were with the more deaths. On the contrary, patients with lower-risk scores tended to have higher expression level of protective lncRNAs (RP13-30A9.2, RP11-488I20.9) and have a lower mortality rate.

The five-lncRNA signature can independently predict the survival of ESCC patients
The Chi-square test found the lncRNA signature was not associated with clinicopathological factors, including age, sex and pTNM stage in the training, test and qPCR validation groups (  Table 4).

The prediction performance of the five-lncRNA signature and TNM stage
TNM stage is currently used as the main indicator for prognosis evaluation of ESCC. Our study confirmed that TNM stage and the five-lncRNA signature were related to the prognosis of patients. Therefore, we performed stratification analysis to explore the predictive power of the five-lncRNA signature in patients with known TNM stage. We first divided all the ESCC patients involved in this study (n = 301) into 2 groups, one group was patients with TNM low (I + II) stage and the other group was patients with high TNM stage (III + IV). Then we used the lncRNA signature to conduct stratification analyses of above two groups. Kaplan-Meier results showed that the lncRNA signature could further separate the patients with TNM low (I + II) stage (Fig. 4a) or patients with high TNM stage (III + IV) (Fig. 4b) into two subgroups with significantly different survival(log-rank test P < 0.001).
To analyze the predictive performance advantages of the five-lncRNA signature, TimeROC analyses were performed in the three datasets (n = 301).

Functional prediction of the five-lncRNAs signature
In order to predict function of the five-lncRNAs signature, we performed Pearson analysis to obtain the genes implicated in the correlation of these 5-lncRNA signature in the GSE53624 and RNAseq datasets, respectively. Then we got 988 co-expression genes in total (Pearson coefficient > 0.4/ < -0.4, P < 0.001, Fig. 5a). Subpath-wayMiner suggested these 988 genes were significantly enriched in 73 different KEGG pathways (P < 0.05, Additional file 5. Table S4), especially involved in p53 signaling pathway, ErbB signaling pathway, Regulation of actin cytoskeleton, MAPK signaling pathway, PPAR signaling pathway VEGF signaling pathway, Pathways in cancer and Toll-like receptor signaling pathway (Fig. 5b), which were ranked in the top 20 and were closely related to the cancer development.

Discussion
Esophageal squamous cell carcinoma is a rapidly progressive disease faced with many difficulties, such as difficulty in early diagnosis, low 5 year survival rate, and lack of effective prognostic markers. Recently, lncRNA has been reported to be involved in the occurrence and progression of tumors [30,31]. However, the expression characteristics and roles of lncRNAs in ESCC are still fairly elusive. Thus, this paper revealed the survival related lncRNAs expression level in ESCC and constructed a prognostic five-lncRNA signature by collected public ESCC lncRNA expression data. Then, we measured the lncRNA expression of 98 ESCC patients by RNA seq and test the lncRNA expression of 84 ESCC tissues for validating the predictive performance of the five-lncRNA signature. Meanwhile, the 98 ESCC lncRNA expression profiles provide scientists with reference data to study the association of ESCC and lncRNAs. Despite numerous research have reported some ESCC lncRNA models for prognostic markers [24], PCR validation was rarely performed in the independent validation group and some lncRNAs in the prediction model might not yet exist in ESCC tissue. In this study, the predictive model was constructed through the analyzing two sets of high-throughput sequencing data, and lncRNAs of the five-lncRNA signature were shown to be present in collected ESCC samples by qPCR. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + +  with long survival (Cox coefficient < 0), suggesting the two lncRNAs were protective lncRNAs for ESCC. The significant correlation of the five lncRNAs with ESCC prognosis was found in two data sets, emphasizing the important clinical research value of the five prognostic lncRNAs in ESCC. However, the function of prognostic lncRNAs in ESCC have not been reported so far and biological roles of the five lncRNAs in ESCC progression should be investigated in further experimental studies. TNM stage is a commonly used tumor classification standard in clinical practice and a recognized prognostic marker [32,33]. However, TNM stage remains flawed in prognostic assessment. We found that the prognostic predictive power of signature is better than TNM stage, suggesting that the strong prognostic ability of the five-lncRNA signature. Consistent with some scholars' results that combined TNM classification with molecular marker can predict outcome of ESCC patients more accurately [34], we found the prognostic prediction of the combination of signature with TNM stage was the best, indicating the signature combined with TNM stageg is useful for prognosis evaluation.

Conclusion
We identified a prognostic five-lncRNA signature from a large cohort of ESCC patients. The signature could predict the survival of patients with ESCC based on lncRNA expression profile and have strong clinical value.