A nomogram based on 9-lncRNAs signature for improving prognostic prediction of clear cell renal cell carcinoma

Background Abnormal expressions of long noncoding RNAs (lncRNAs) are very common in clear cell renal cell carcinoma (ccRCC), and some of these have been reported to be highly correlated with prognosis of ccRCC patients. Methods “edgeR” AND “DEseq” R packages were used to explore differentially expressed genes (DEGs) between normal and tumor tissues of ccRCC samples from The Cancer Genome Atlas (TCGA). Univariable Cox survival analysis, robust likelihood-based survival model and multivariable Cox regression analysis were used to identify prognostic lncRNAs and construct lncRNAs signature. Finally, a graphic nomogram based on the lncRNAs signature was developed to predict 1-, 3- and 5-year survival probability of ccRCC patients by using rms R package. Results 8413 DEGs including 2740 lncRNAs and 4530 mRNAs were identified between normal and tumor tissues. 395 lncRNAs were found to be associated with prognosis of ccRCC patients (P < 0.05). Among these 395 prognostic lncRNAs, 9 key prognostic lncRNAs (RP13-463N16.6, CTD-2201E18.5, RP11-430G17.3, AC005785.2, RP11-2E11.9, TFAP2A-AS1, RP11-133F8.2, RP11-297L17.2 and RP11-348J24.2) were identified by using robust likelihood-based survival model. A 9-lncRNAs signature was constructed by using estimated regression coefficients of the 9 key prognostic lncRNAs. Results of χ2-test or Fisher’s exact test indicated that the 9-lncRNAs signature was significantly associated with clinicopathological characteristics such as tumor grade, T stage, N stage, M stage, TNM stage and survival outcome of ccRCC patients. Multivariate analysis showed that the 9-lncRNAs signature, age and M stage were independent prognostic factors. Finally, a graphic nomogram based on the lncRNAs signature, age and M stage was developed to predict 1-, 3- and 5-year survival probability of ccRCC patients by using rms R package. Conclusions A 9-lncRNAs signature associated with prognosis of ccRCC patients was constructed and a promising prognostic nomogram based on the 9-lncRNAs signature was developed for 1-, 3- and 5-year OS prediction of ccRCC patients in this study.

as external tumor factors such as age, type of disease presentation, performance status, nuclear grading, and microscopic tumor necrosis were ignored [4].
Recently, several novel systems have been developed for prognostic prediction of ccRCC patients. For example, Stage, Size, Grade, and Necrosis (SSIGN) Score system, which was firstly reported in 2002 for outcome prediction of ccRCC patients, has been proved to have better predictive ability than TNM staging system [5]. Another staging system consisted of tumor grade, N stage and patients' performance status could accurately distinguish RCC patients with different survival possibility [6]. However, these systems included parameters depended on subjective judgement of professional pathologist, and they made prediction easily even influenced by inter-observer variabilities. Therefore, more concise and practical tools are urged for improving prognostic prediction of ccRCC patients.
Recent years, emerging evidences have found that long noncoding RNAs (lncRNAs) play important roles in regulation of mRNA transcription and protein translation [7,8]. Abnormal expressions of lncRNAs are frequent biological phenomena in tumor and closely associated with prognosis of tumor patients. For example, overexpression of lncRNA urothelial carcinoma associated 1 (UCA1), which could promote aggressiveness of cancer cell, is associated with prognosis of multiple tumor such as bladder cancer, colorectal cancer, gastric cancer and so on [9][10][11][12]. LncRNA metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) was first discovered in lung cancer, and up-regulation of MALAT1 is negatively associated with survival time of lung cancer patients [13]. Up-regulation of MALAT1 was also found in RCC patients and could promote proliferation and invasion of RCC [14]. Another lncRNA titled metastatic renal cell carcinoma-associated transcript 1 (MRCCAT1) participated in activating p38-MAPK signaling by inhibiting NPR3 and could promote metastasis of ccRCC, and its upregulation was associated with poorer outcome of ccRCC patients [15]. In summary, lncRNAs are important participants of various biological process and can be important biomarkers source of prognostic prediction and tumor targeted therapy.
In this study, a 9-lncRNAs signature associated with prognosis of ccRCC was identified by mining RNA-Seq data from The Cancer Genome Atlas (TCGA, https :// tcga-data.nci.nih.gov/tcga/). Then, a nomogram based on this 9-lncRNAs signature was developed for improving prognostic prediction of ccRCC patients, and it will be a useful diagnostic tool for ccRCC patients in the future.

Data source and reprocessing
RNA-Seq data of ccRCC patients together with the corresponding clinicopathological data was obtained from TCGA. Ensembl IDs were annotated in the form of gene symbols and biotypes based on GENCODE project gene annotation file (version 22, GRCh38). Then, reads per kilobase per million mapped reads (RPKM) were transformed into transcripts per kilobase of exon model per million mapped reads (TPM) for data standardization. Because of huge numerical span of TPM values, the gene expressions of each gene were then presented in the form of log2(TPM + 1) and integrated into one matrix table.

Identification of differentially expressed genes (DEGs)
In order to compare expression differences of genes between normal and tumor tissue, P value and fold change (FC) of each gene was generated by using "edgeR" and "DEseq" R packages, and genes with P value < 0.05 and | log 2 FC| > 1 were defined as DEGs [16,17]. However, expression levels of some DEGs were very low, these DEGs were further filtered out referring to criterion of previous research [18]. Finally, abundantly differentially expressed LncRNAs and mRNAs were separated from the remaining DEGs, and all patients were randomly divided into training group and testing group for the following analysis.

Selection of prognostic lncRNAs
Then, univariable Cox survival analysis was employed to explore relationships between overall survival (OS) and the differentially expressed LncRNAs in training group. Parameters including hazard rate (HR) and P value were generated by using survival package in R environment. The lncRNAs with P value < 0.05 were selected as prognostic lncRNAs for the following screening.

Identification of key prognostic lncRNAs
However, it was no suitable for establishment of risk score formula because of large number of prognostic lncRNAs. Then, a Robust likelihood-based survival model was used to identify the key prognostic lncRNAs of ccRCC patients [19]. The detail procedure was as followings: 1. All samples were randomly divided into two sets: the training set (2/3*N samples) and the validation set (1/3*N samples). Fit a lncRNA to the training set of samples and obtain the parameter estimate for the lncRNA. Then we evaluate log likelihoods with the parameter estimate and the validation set of samples. The evaluation was repeated for every lncRNA. 2. After 10 repetitions of the above procedure, we obtained 10 log likelihoods for each lncRNAs. The best lncRNA with the largest mean log likelihood was selected. Subsequently, the next best lncRNA were searched by evaluating every two-lncRNA model and an optimal one was selected with the largest mean log likelihood.
3. This forward lncRNA selection procedure was continued until fitting is impossible, and a series of models were got. Akaike's information criterions (AICs) for all the candidate models were computed and an optimal model with the smallest AIC was selected.
After repeating procedure for 1000 times, key prognostic lncRNAs were finally selected out from the prognostic lncRNAs.

Risk score formula establishment
Multivariable Cox regression analysis was used to generate estimated regression coefficients of key prognostic lncRNAs in the training group, and then a lncRNAs signature consisted of these lncRNAs was constructed by using respective coefficient. According to the risk score formula, risk score of each patient was calculated and the optimal risk score was determined by using survminer R package to cut patients of training group into low-risk group and high-risk group. Then, OS differences between low-risk group and high-risk group were compared by using Kaplan-Meier method. In order to test the sensitivity and specificity of the risk score formula, time-dependent dynamic receiver operating characteristic (ROC) curve area under the curve (AUC) values (1-10 years) were generated by using survival ROC R package [20]. Similarly, the formula was further applied in the testing group and training + testing group to validate its stability.

Clustering analysis of key prognostic lncRNAs
According to the HRs of key prognostic lncRNAs, lncRNA associated with better prognosis (HR < 1) was defined as protective lncRNA and lncRNA associated with worse prognosis (HR > 1) was defined as risk lncRNA. A patient would get one risk factor if the expression of risk lncRNA in this patient > median expression of risk lncRNA or expression of the protective lncRNA in this patient < median expression of protective lncRNA. Then the number of risk factors was counted for every patient in training + testing set. Survival analysis was performed by using different number of risk factors as cut-off values.

Prediction ability of the lncRNAs signature for localized ccRCC and advanced ccRCC
As we known, prognosis of ccRCC patients is different between localized ccRCC (stage I and II) and advanced ccRCC (stage III and IV). So, survival analysis was performed between localized ccRCC patients and advanced ccRCC patients by using Kaplan-Meier method, respectively. Meanwhile, ROC AUC value of 1-10-years survival was calculated to testing sensitivity and specificity of the lncRNAs signature.

Drug response prediction
Because not all advanced ccRCC patients were sensitive to radiotherapy and chemotherapy, we used the lncRNAs signature to perform drug sensitivity prediction by R package "pRRophetic". Common antitumor drugs of ccRCC such as axitinib, cisplatin, gemcitabine, pazopanib, sorafenib and temsirolimus were selected from the pharmacogenomics database "The Genomics of Drug Sensitivity in Cancer" (GDSC) (https ://www.cance rrxge ne.org/) [21,22]. Halfmaximal inhibitory concentration (IC50) of TCGA samples were estimated by ridge regression against the GDSC training set [23]. Tenfold cross-validation was used to evaluated prediction accuracy of estimated IC50. Mann-Whitney-Wilcoxon Test was used to test whether IC50 distributions of high risk group and low risk group were identical.

Weighted gene co-expression analysis (WGCNA) and gene enrichment analysis
In order to explore relationships between lncRNAs signature and biologic functions of ccRCC, weighted gene coexpression network analysis (WGCNA) was employed to construct the gene co-expression modules among differentially expressed mRNAs [24]. The modules with the maximal absolute module significance associated with lncRNAs signature were selected out. Then, gene enrichment analysis of genes in the most lncRNAs signature related module was performed by using cluster Profiler R package [25].
Relationship between risk score and clinicopathological characteristics χ 2 -Test or Fisher's exact test was employed to explore relationships between lncRNAs signature and clinicopathological characteristics such as age, tumor grade, TNM stage and so on. Because clinicopathological characteristics such as TNM stage are highly associated with prognosis of ccRCC patients, univariable and multivariable Cox regression analysis were performed to test whether the lncRNAs signature was independent of clinicopathological characteristics.

Nomogram construction based on lncRNAs signature
Finally, a nomogram consisted of independent prognostic factors was constructed by employing rms R package.  Separating capacity of the nomogram was tested by Harrell's concordance index (C-index), and the calibration curves of the nomogram were constructed to test consistency between 1-, 3-and 5-year survival probability prediction based on the nomogram and actual observation.

Data acquisition and DGEs identification
In total, 60,483 genes of 516 patient samples were reannotation by using version 22 GENCODE project gene annotation file. After comparing expression level of genes between 72 normal tissue and 72 ccRCC, 8413 DGEs including 2740 lncRNAs and 4530 mRNAs were obtained (Fig. 1). After removing lncRNAs of low expression, 1062 abundantly expressed lncRNAs were finally selected out. Then, the 516 samples were randomly divided into a training group (258 samples) and a testing group (258 samples) for following analysis.

Identification of key prognostic lncRNAs
395 prognostic lncRNAs (P < 0.05) were identified by univariable Cox survival analysis in training group, and top 20 lncRNAs with least P value were presented in Table 1.

Establishment of 9-lncRNAs risk score formula
Based on the estimated regression coefficients of the 9 lncRNAs, a risk score formula was finally developed. The risk score of each patient was calculated referring to the following formula: The distribution of the risk score, survival status along with survival time of ccRCC patients and relative expression of the 9 key prognostic lncRNAs were shown in    (Fig. 3a), testing group (Fig. 3b) and training group + testing group (Fig. 3c)].

Survival analysis and time-dependent dynamic ROC
OS comparison between patients with high risk score and patients with low risk score was performed by using optimal risk score as cut-off value. The result suggested that patients with higher risk score have shorter OS than patients with low risk score in training group (HR = 4.92, P < 0.001) (Fig. 4a), testing group (HR = 3.43, P < 0.001) (Fig. 4b) and training group + testing group (HR = 3.95, P < 0.001) (Fig. 4c). ROC analysis indicated that the 9-lncRNAs signature had perfect sensitivity and specificity for prognostic prediction of ccRCC patients with AUCs of 1-10 years OS > 0.5 (Fig. 4d-f ).

Clustering analysis of key prognostic lncRNAs
In order to test the stability of 9-lncRNAs signature, patients of training group + testing group were clustered by using different cut-off values (≥ 1, ≥ 2, ≥ 3, ≥ 4, ≥ 5, ≥ 6 , ≥ 7, ≥ 8 and ≥ 9) according to the number of risk factors. As shown in Fig. 5, Kaplan-Meier curves of 9 clusters were all with HR > 1 with significant P values < 0.05, and it suggested that patients with more risk factors would have a poorer prognosis.

The 9-lncRNAs signature have good prediction ability for both localized ccRCC and advanced ccRCC
Kaplan-Meier curves of localized ccRCC and advanced ccRCC were showed in Fig. 6 by using optimal risk score as cut-off value. Higher risk score was closely associated with poorer prognosis among localized ccRCC patients (Fig. 6a) and advanced ccRCC patients (Fig. 6b), and AUCs of 1-10 years OS in two stages were all above 0.5. Interestingly, the 9-lncRNAs signature had better prediction ability for long term OS (> 4 year) in advanced ccRCC and had better prediction ability for short term OS (< 4 year) in localized ccRCC (Fig. 6c).

Advanced ccRCC patients with low risk are more sensitive to gemcitabine and sorafenib
Using median risk score as cut-off value, we observed a significantly different estimated IC50 for gemcitabine and sorafenib between patients with low risk score and patients with high risk score. As shown in Fig. 7, estimated IC50 of patients with low risk were lower than that of patients with high risk for gemcitabine (P = 0.003) and sorafenib (P = 0.04). However, drug sensitivity of advanced ccRCC patients for axitinib, cisplatin, pazopanib, and temsirolimus between high risk and low risk has no significant difference.

Identification of the 9-lncRNAs signature associated biological pathways
As depicted above, the 9-lncRNAs signature had a strong discriminatory power for prognosis of ccRCC patients, therefore this signature might be closely associated with biological pathways of ccRCC. 4530 differently expressed mRNAs were used to construct 14 similar expression modules by average linkage clustering. The relevance with P value between each module and 9-lncRNAs signature was listed in every module (Fig. 8). The most negative related module (black module, R = − 0.63, P = 2*E−56) and positive related module (red module, R = 0.36, P = 1*E−16) were selected out for gene enrichment analysis. Genes in black module were mainly enriched in molecular transport associated pathways such as small molecule catabolic process, organic anion transport, organic acid transport, carboxylic acid transport and organic acid catabolic process, and genes in black module were mainly enriched in cell division associated pathways such as nuclear division, organelle fission, mitotic nuclear division, chromosome segregation, mitotic sister chromatid segregation.

The 9-lncRNA signature is an independent prognostic factor in ccRCC
By χ 2 -test or Fisher's exact test, we found that the 9-lncR-NAs signature was significantly associated with tumor grade, T stage, N stage, M stage, TNM stage and survival status of ccRCC patients (Table 3). Then, relationships between prognosis and clinicopathological characteristics were analyzed by using Cox proportional hazard regression model (Table 4). Univariate analysis showed that 9-lncRNAs signature, age, T stage, N stage, M stage, TNM stage and grade were significantly correlated with OS. While multivariate analysis showed that only 9-lncR-NAs signature, age and M stage remained significantly associated with OS. The results indicated that 9-lncR-NAs signature was a fine prognostic predictor which was independent of TNM staging system.

Nomogram based on 9-lncRNA signature for prognostic prediction of ccRCC patient
A graphic nomogram including the lncRNAs signature, age and M stage was constructed to predict 1-, 3-and 5-year survival probability of ccRCC patients by using rms R package (Fig. 9a). The C-index of the nomogram Fig. 6 Survival analysis of localized ccRCC patients (a) and advanced ccRCC patients (b), and time-dependent dynamic ROCs (c) for testing sensitivity and specificity of 9-lncRNA signature was up to 0.79 (95% confidence interval 0.75-0.82), and it meant that the nomogram had a good ability to discriminate patients of poor prognosis from patients of favor prognosis. Meanwhile, the calibration plot showed that 1-, 3-and 5-year survival probability predicted by the nomogram had optimal agreement with actual observation (Fig. 9b).

Discussion
As we known, ccRCC is a complex tumor caused by intricate genetic and molecular alterations, and some of these alterations are closely associated with the prognosis of ccRCC patients [26]. However, TNM staging system failed to consider these genetic alterations of ccRCC, and it made TNM staging system not perfect for accurate prognostic prediction of ccRCC patients [5]. In this study, we constructed a 9-lncRNAs prognostic signature (RP13-463N16.6, CTD-2201E18.5, RP11-430G17.3, AC005785.2, RP11-2E11.9, TFAP2A-AS1, RP11-133F8.2, RP11-297L17.2 and RP11-348J24.2) by using robust likelihood-based survival model [19]. Meanwhile, χ 2 -test or Fisher's exact test found the 9-lncRNAs signature was highly related to tumor grade, T stage, N stage, M stage, TNM stage and survival status of ccRCC patients. And multivariate analysis revealed that the 9-lncRNAs signature, age and M stage were independent prognostic indicators for ccRCC patients. Interestingly, we also found the lncRNA signature could predict therapeutic response of two drugs (gemcitabine and sorafenib), and the signature would provide the reference for guiding clinical use of anti-tumor drugs. Finally, a concise nomogram consisted of the 9-lncRNAs signature, age and M stage was developed for prognostic prediction of ccRCC patients.
In this study, we identified 9 key prognostic lncRNAs of ccRCC patients from TCGA, however, no study had reported about specific biological mechanism of these lncRNAs except TFAP2A-AS1. In a previous study, TFAP2A-AS1 was reported as a tumor suppressor which was associated with better prognosis of breast cancer. The study found that TFAP2A-AS1 acted as miRNA sponge for miR-933 which could degrade smad2 mRNA, and could inhibit proliferation and invasion of breast cancer cell [27]. However, in our study, survival analysis of TFAP2A-AS1 showed that high expression of TFAP2A-AS1 was related to poor prognosis of ccRCC patients. Because of varying lncRNAs' biofunction and tumor heterogeneity, lncRNA can alter biological behaviors of different tumors toward different directions [28][29][30]. So, Fig. 7 Estimated half-maximal inhibitory concentration (IC50) of high risk group and low risk group for antitumor durgs: axitinib, cisplatin, gemcitabine, pazopanib, sorafenib and temsirolimus it is not surprising that TFAP2A-AS1 has opposite prognostic effect on breast cancer and ccRCC patients. So far, no study has reported about functions of other 8 lncR-NAs, and further study in exploring function of these prognostic lncRNAs in ccRCC are needed in the future.
WGCNA is a data exploratory tool which can identify relationships between external characteristics and high throughput data such as gene microarray, RNA-seq, proteomics data and so on [24]. In order to make clear relationships between the 9-lncRNAs signature and molecular biological mechanism of ccRCC, WGCNA was performed to seek gene modules associated with the 9-lncRNAs signature. Two gene modules, which were most associated with the 9-lncRNAs signature, were identified, and gene enrichment analysis were performed to explore functions of two modules. Interestingly, genes in black module (negatively correlated) mainly enriched in pathways of acid and anion transport and genes in red module (positively correlated) mainly enriched in pathways of cell division and proliferation. As we known, normal tissue of kidney could transport useless metabolite out of our body and maintain water electrolyte balance of our body [31]. So, biological pathways in tumor with low risk score is more like that in normal tissue than that in tumor with high risk score, meanwhile, tumor with high risk score has stronger proliferative ability than tumor with low risk score. In summary, the 9-lncRNAs signature is useful to evaluate differentiated degree of tumor, with low risk score indicating well-differentiated and high risk score indicating poor-differentiated.
lncRNA signatures are novel prognostic systems which made prediction of clinic outcome simpler and more cost-saving, lncRNAs could be fast detected by polymerase chain reaction (PCR) by specific primers. So far, several lncRNAs score systems have been identified for outcome prediction of tumors such as gastric cancer [32][33][34][35], lung cancer [36][37][38], hepatocellular carcinoma [39][40][41] and so on, and these signatures provide promising biomarkers for prognostic prediction and therapeutic targets for tumor therapy. lncRNAs signatures have also been developed for RCC patients, for example, a 6-lncRNA prognostic signature was developed based on RNA-seq data from TCGA and could precisely predict survival for patients among three RCC subtypes: ccRCC, papillary renal cell carcinoma and chromophobe renal cell carcinoma [42]. Another lncRNA signature named RCClnc4 scores was proved to have higher accuracy for assessing risk of localized ccRCC patients than the TNM staging system and SSIGN score [43]. However, as we known, clinical characteristics such as age, pathological stage, tumor size and tumor grade can also affect prognosis of tumor patients. It would be more accurate for prognostic prediction of RCC patients if these lncRNAs score systems could include these clinical characteristics.
Nomograms are widely used prognostic tools which can generate an individual probability for tumor patients according to corresponding clinical parameters. Nomograms can integrate diverse prognostic variables regardless of continuous variables or discontinuous variables and are user-friendly for clinician judgment [44]. In this study, we got a 9-lncRNAs signature which was highly associated with prognosis of ccRCC patients and independent of TNM staging Table 3 Relationship between clinicopathological characteristics and risk score calculated by using the 9-lncRNAs signature a Median expression of risk score was used as cut-off value to cut the patients into high risk group and low risk group

Factor
Risk score a P value  system. And a prognostic nomogram including 9-lncR-NAs signature and clinical characteristics such as age and distant metastases of tumor was developed for prognostic prediction. These clinical characteristics are not influenced by inter-observer variabilities and can be got easily by medical history inquiry, image and pathological examination. However, our study still has some limitations. First, the nomogram was created based on one cohort obtained from TCGA, and it would be better if validated by other cohorts. Second, some potential clinical characteristics such as blood biochemistry and nutritional status were ignored. Third, LncRNAs are dynamic in body and this characteristic may made the signature not stable, multiple site biopsies would be helpful to improve stability of lncRNA signature. Despite these limitations, this is the first nomogram which was based on lncRNAs signature and provided a new methodology of developing prognostic score system for ccRCC patients. This nomogram can easily separate patients with poor prognosis from patients with good prognosis by performing PCR. And clinicians can develop more individualized treatment regimens for patients with different prognosis, this will make treatment more targeted and save more public health resources. Meanwhile, this nomogram consists of objective indicators which prevent prognostic prediction from inter-observer variabilities of different pathologists.