Identifying a ten-microRNA signature as a superior prognosis biomarker in colon adenocarcinoma

Background Increasing studies have suggested that aberrant expression of microRNAs might play essential roles in the progression of cancers. In this study, we sought to construct a high-specific and superior microRNAs signature to improve the survival prediction of colon adenocarcinoma (COAD) patients. Methods The genome-wide miRNAs, mRNA and lncRNA expression profiles and corresponding clinical information of COAD were collected from the TCGA database. Differential expression analysis, Kaplan–Meier curve and time-dependent ROC curve were calculated and performed using R software and GraphPad Prism7. Univariate and multivariate Cox analysis was performed to evaluate the prognostic ability of signature. Functional enrichment analysis was analyzed using STRING database. Results We identified ten prognosis-related microRNAs, including seven risky factors (hsa-miR-197, hsa-miR-32, hsa-miR-887, hsa-miR-3199-2, hsa-miR-4999, hsa-miR-561, hsa-miR-210) and three protective factors (hsa-miR-3917, hsa-miR-3189, hsa-miR-6854). The Kaplan–Meier survival analysis showed that the patients with high risk score had shorter overall survival (OS) in test series. And the similar results were observed in both validation and entire series. The time-dependent ROC curve suggested this signature have high accuracy of OS for COAD. The Multivariate Cox regression analysis and stratification analysis suggested that the ten-microRNA signature was an independent factor after being adjusted with other clinical characteristics. In addition, we also found microRNA signature have higher AUC than other signature. Furthermore, we identified some miRNA-target genes that affect lymphatic metastasis and invasion of COAD patients. Conclusion In this study, we established a ten-microRNA signature as a potentially reliable and independent biomarker for survival prediction of COAD patients.


Background
Colon adenocarcinoma (COAD), the fourth most commonly malignant cancer, has been the fifth leading cause of cancer-related death diseases in worldwide. It was estimated that nearly 101,420 new COAD cases were diagnosed and 27,640 deaths in the United States in 2019 [1]. Despite diagnostic methods and comprehensive treatment have been developed during the past few years, the overall 5-year survival rate of COAD patients is still unsatisfactory and the underlying molecular mechanisms of COAD progression are still elusive [2]. Although several biomarkers for COAD have been undergoing or tested in clinical trials, such as carcinoembryofnic antigen (CEA) [3] and so on, many more potential reliable and valuable biomarkers are imperative to be detected and constructed to improve the prognosis of patients with COAD.
MicroRNAs (miRNAs) are a large family of post-transcriptional regulators that are approximately 21 nucleotides in length and control many developmental and cellular processes in eukaryotic organisms [4]. More recently, increasing studies have demonstrated that microRNAs play essential roles in the progression of cancers. Some microRNAs were associated with disease prognosis and clinical outcome, suggesting microRNAs could be one of the best candidates as potential biomarkers for cancers [5,6].
In addition, compared with a single biomarker, integrating multiple signature model would fundamentally improve the precise of prognostic value [7], and multigene-expression signatures have been reported to predict prognostic in various cancers [8,9]. Therefore, searching a panel of microRNA signature might have predictive and prognostic value in patients with COAD.
In the present study, we established an effective micro-RNAs-based signature to predict the prognosis of COAD taking advantage of TCGA database. And we verified the predictive power of this signature for COAD in validation and entire series. More significantly, we verified our microRNA signature perform better than other signature. Finally, we identified some miRNA-target genes and signaling pathways that promote the progression of COAD patients.

COAD patients' information collection
The microRNAs, mRNA and lncRNA expression profiles of 441 COAD tissues and eight adjacent tissue samples, and their corresponding clinical information were download from The Cancer Genome Atlas of the National Cancer Institute (TCGA, http://cance rgeno me.nih.gov). The downloaded clinical information summarized in Table 1.

microRNAs selection and signature building
First, we performed differentially expressed microRNAs analysis between 441 tumor tissues and eight adjacent tissues. Then, the 441 COAD cases in TCGA database were randomly divided into a test series (N = 300) and a validation series (N = 141). Next, those differentially expressed microRNAs were analyzed using the univariate and multivariate Cox proportional hazards regression analysis in the test series. Based on the expression and coefficient of microRNAs, an optimal microRNAs prognostic signature was established and validated in both the validation and entire series. The microRNAbased risk score formula was constructed as follows: In this formula, n represents the number of microR-NAs, Coei indicates the coefficient of every microRNA in the result of multivariate Cox regression analysis, and EVi represents the expression level of the every microRNA.

mRNA and lncRNA signature building
Differentially expressed mRNA and lncRNA were selected to carry out univariate and multivariate Cox analysis. Then, we established mRNA signature and lncRNA signature based on the expression and coefficient of mRNAs and lncRNAs respectively. The risk score formula as follows: In this formula, n represents the number of mRNAs or lncRNAs, Coei indicates the coefficient of every mRNA or lncRNA in the result of multivariate Cox regression analysis, and EVi represents the expression level of the every mRNA or lncRNA.

Target gene prediction and functional enrichment analysis
The target genes of these prognostic miRNAs was predicted using Targetscan (http://www.targe tscan .org/) and miRDB (http://www.mirdb .org/). The overlapping target genes in these two databases were considered as miRNA-target genes and used for further analysis. Functional enrichment analysis including cellular component (CC), molecular function (MF), biological process (BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed by STRING database (https ://strin g-db.org/cgi/input .pl) and visualized using GraphPad Prism7 (GraphPad Software Inc., La Jolla, CA).

Statistical analysis
Differentially expressed microRNAs, mRNA and lncRNA between the COAD patients and corresponding adjacent tissue samples were calculated by the 'edgeR' package. Volcano plot was calculated and depicted by "volcano" R package. Venn diagram was drew using online website (http://bioin forma tics.psb.ugent .be/webto ols/venn/). The OS were analyzed by Kaplan-Meier survival curve analysis and calculated by the log-rank test. The time-dependent receiver operating curve (ROC) was performed to assess the sensitivity and specificity of the signature prognosis prediction.

Screening ten microRNAs as potential prognostic markers for COAD
We identified 358 significantly differentially expressed microRNAs between 443 COAD tissues and 8 adjacent tissues (|log 2 FC| > 1, P < 0.05), including 223 upregulated and 135 downregulated microRNAs. Further, Univariate Cox regression analysis was performed to find out microRNAs related to patients' prognosis in test series (N = 300). The result showed that 29 microRNAs were significantly related to the overall survival of patients with COAD (All P < 0.05, Table 2). To improve the precise of prognostic effect of microRNAs, those 29 candidate microRNAs were further analysis by multivariate Cox analysis in the test series. Finally, a total of 10 microRNAs were filtered as the candidate factors obviously associated with the prognosis of COAD (All P < 0.05, Table 3). The differentially expression level of these ten microR-NAs in paired COAD patients tissue were shown Additional file 1: Figure S1a, and the microRNAs expression level in 441 COAD patients and eight adjacent normal tissues were shown in Additional file 1: Figure S1b. These results showed that six of these ten microRNAs were significantly higher expression in COAD cancer, and four of them were significantly lower expression in COAD cancer (All P < 0.05).

Identifying a ten-microRNA signature as potential prognostic indicator of COAD
To optimize the predictive microRNAs profile, the filtered ten microRNAs were used to construct a predictive microRNAs signature. Based on the expression levels of these ten microRNAs and their coefficient assessed by multivariate Cox analysis, a novel risk score formula was established as follows: Risk score = (0.5059*hsa-miR-197) + (0.3729*hsa-miR-32) + (− 0.3905*hsa-miR-3917) + (0.3766*hsa-miR-887) + (0.5924*hsa-miR-3199-2) + (0.4152*hsa-miR-4999) + (− 0.3412*hsa-miR-3189) + (− 0.3851*hsa-miR-6854) + (0.3264*hsa-miR-561) + (0.2422*hsa-miR-210). Among these microRNAs, the coefficients of those seven microRNAs (hsa-miR-197, hsa-miR-32, hsa-miR-887, hsa-miR-3199-2, hsa-miR-4999, hsa-miR-561 and hsa-miR-210) were positive, manifesting their COAD driving effect, while the coefficients of other three microRNAs (hsa-miR-3917, hsa-miR-3189 and hsa-miR-6854) were negative, indicating their COAD protecting effect. Based on the prognostic model formula, we calculated and ranked the risk score of each patient. The distribution of risk score, survival status of patients and the expression profiles of ten microRNAs in the test series were shown in Fig. 1a. The results showed that the patients with high risk score have shorter survival time than those with low risk score. The expression levels of seven risky microRNAs were higher in the patients with high risk scores, while the expression levels of three protective microRNAs were higher in the patients with low risk scores. And similar results were also observed in both validation (N = 141) and entire series (N = 441, Fig. 1b, c).
Then, in order to explore the prognostic value of the ten-microRNA signature, a total of 300 patients in test series were separated into high-risk score group (N = 150) and low-risk score group (N = 150). The results showed that the patients in high-risk group have an obviously shorter OS and poor prognosis than the patients in low-risk group (P < 0.0001, Fig. 2d). And similar results were also observed in both validation (P = 0.00617) and entire series (P < 0.0001, Fig. 1e, f ). Then, the time-dependent ROC curve results from three series showed that the prognostic accuracy of the ten-microRNA signature was 0.822, 0.760 and 0.820 at 1 year; 0.857, 0.775 and 0.838 at 3 year; 0.863, 0.797 and 0.855 at 5 year; 0.845, 0.772 and 0.831 at 7 year, respectively. (Fig. 1g-i). These results suggested that the ten-microRNA signature had high specificity and sensitivity and could predict the prognosis of patients with COAD effectively. Taken together, this ten-microRNA signature is a meaningful potential biomarker to predict prognosis of COAD patients.

The ten-microRNA signature is a prognostic indicator independent to other clinical characteristics
To further confirm the predictive effect of the ten-microRNA signature and other clinical characteristic on the prognosis, the univariate and multivariate Cox analysis were performed in entire series. The result of univariate Cox analysis showed that the ten-microRNA signature and some clinical characteristics (including age, TNM stage, T stage, N stage, M stage, Cancer status and Lymphatic invasion) were obviously correlated with the prognosis of COAD ( Fig. 2a and Table 4).  Furthermore, the multivariate Cox analysis showed that the ten-microRNA signature is still an independent prognostic factor (HR = 2.981, 95%CI 1.575-5.642, P = 0.001) ( Fig. 2b and Table 4). The other conventional clinical characteristics, such as age (HR = 2.040, 95%CI 1.132-3.677, P = 0.018) and Cancer status (HR = 3.126, Fig. 1 Identification of ten-microRNA signature associated with prognosis of patients. Risk score distribution, patients' status and heatmap of ten microRNAs expression in test series (a), validation series (b) and entire series (c). Kaplan-Meier analysis of the ten-microRNA signature in test series (d), validation series (e) and entire series (f). Time-dependent ROC analysis of the sensitivity and specificity of the ten-microRNA signature in test series (g), validation series (h) and entire series (i) 95%CI 1.491-6.552, P = 0.003), were also proven to be independent factors associated with the overall survival ( Fig. 2b and Table 4). To further verify prognostic ability of clinical characteristics, we performed Kaplan-Meier analysis. The results were consistent with univariate Cox analysis results and showed that age (P = 0.0116), TNM stage (P < 0.0001), T stage (P = 0.0032), N stage (P < 0.0001), M stage (P < 0.0001), cancer status (P < 0.0001) and Lymphatic invasion (P = 0.0004) were associated with prognosis of COAD patients (Additional file 2: Figure S2a-i).
Furthermore, stratified analysis were performed to further evaluate whether the ten-microRNA signature exhibit predictive effect within same clinical characteristics. We stratified patients into different group based on age (< 68 or > = 68), TNM stage (I + II or III + IV), N stage (N0 + N1 or N2), M stage (M0 or M1), Cancer status (tumor free or with tumor) and Lymphatic invasion (yes or no). These results showed that the ten-microRNA signature can still separate patients into high-risk or low-risk group, and patients in high-risk group have shorter OS and poor prognosis than those in low-risk group (Fig. 3a-l). Taken together, these results demonstrated that this ten-microRNA signature was independent risk factors for survival prediction of COAD patients and could stratify patients from different group into subtypes with different prognosis.

The ten-microRNA signature perform better in survival prediction than other signature
To compare the predictive effect between miRNA signature and other signature, we first performed differentially mRNA and lncRNA expression analysis between patients and adjacent tissues. Then, those different   (Fig. 4a, b) and time-dependent ROC curve with mRNA signature and lncRNA signature (Fig. 4d,  e). These results showed both the mRNA signature and lncRNA signature exhibited their prognostic value as biomarker for COAD and have high specificity and sensitivity for OS of COAD. In addition, we performed KM plot curve and time-dependent ROC curve with other microRNA signature in previous study (Fig. 4c, f ). Next, we compare predictive performance between our micro-RNA signature and other signature. The Kaplan-Meier curve analysis result showed that patients with our low risk score have significant better survival than those with other low risk score. The time-dependent ROC analysis results showed that our microRNA signature have obvious higher AUC than other signature in 3 year and 5 year OS. Taken together, our ten-microRNA signature is a superior indicator for prognosis of COAD patients.

High risk score associates with advanced TNM stage, lymphatic metastasis and invasion of COAD patients
We further investigated if there was a relationship between the ten-microRNA signature and clinical pathological characteristics. The result showed that TNM stage (P = 0.009), N stage (P = 0.003) and Lymphatic invasion (P = 0.001) were significantly related to the ten-micro-RNA signature and patients with high risk score were mainly enriched in stage III + IV, N2 stage and Lymphatic invasion (Table 5). To visualize the relationship between the ten-microRNA signature and other clinical characteristic, we ranked patients according to their risk score, and found obviously asymmetric distribution of the TNM stage, N stage and Lymphatic invasion (Fig. 5). The result showed that these clinical characteristics, advanced TNM stage (III + IV stage), higher N stage (N2 stage) and Lymphatic invasion were mainly enriched in the higher risk section (Fig. 5a). We further compared the risk score of patients separated by these clinical characteristics. These results showed that the risk score signature was higher in TNM stage III and IV, compared with TNM stage I and II (Fig. 5b). Contrast to the lower N stage, the risk score signature was higher in N2 stage (Fig. 5c). Further, the risk score signature was mainly higher in Lymphatic invasion (Fig. 5d). However, the other clinical characteristic showed no significant correlation with the ten-microRNA signature ( Fig. 5a and Table 5). Taken together, these results indicated that high risk score could predict advanced TNM stage, lymphatic metastasis and invasion of COAD patients.

Incorporation of risk score into N stage and lymphatic invasion better predicts prognosis of COAD patients
Since many studies have shown that combined biomarkers are able to improve the prognostic accuracy than a single marker. And our risk score were positively associated with N stage and lymphatic invasion. Hence, we further investigated whether the incorporation of risk score into N stage and lymphatic invasion could better predict the prognosis for OS of COAD patients. The time-dependent ROC curve analysis results showed that AUC was 0.879, 0.859 and 0.870 at 3 year; 0.889, 0.883 and 0.892 at 5 year for OS with combined risk score and N stage, combined risk score and lymphatic invasion and combined risk score, N stage and lymphatic invasion (Fig. 6). These results demonstrated that combining risk score and N stage and lymphatic invasion could better predict prognosis of COAD patients.

Identifying miRNA-target genes associated with lymphatic metastasis and invasion and poor prognosis of COAD patients
To explore genes that affect lymphatic metastasis and invasion of COAD patients, we first performed differentially mRNA expression analysis between N0 + N1 stage and N2 stage, non-lymphatic invasion and lymphatic invasion respectively (Fig. 7a, b). In addition, we also investigated the target genes of the ten microRNAs using two independent miRNA target gene prediction websites: Targetscan and miRDB. Target genes overlapping in the two websites were considered as miRNA-target genes and there are 3248 target genes (Fig. 7c). Then, Venn diagram showed that in these target genes, there are 58 mRNAs were upregulated in N stage, 10 mRNAs were upregulated in lymphatic invasion and 10 mRNAs were upregulated in both N stage and lymphatic invasion. Furthermore, the Kaplan-Meier curve was performed to analyze the prognostic value of these miRNA-target genes using ualcan website. The results showed that there are 12 mRNAs were associated with prognosis of COAD patients, where KRTAP3-1, SLC35F3 and SLITRK4 were upregulated in both N stage and lymphatic invasion, while ATP2B2, CILP, ELOVL2, ERBB4, PCDH9, RBM20, SPTBN4, SYT6 and TMEM132E were upregulated in N stage. Taken together, these miRNA-target genes might affect the prognosis of COAD patients through promote lymphatic metastasis and invasion of patients.

Identifying biological pathways that the ten-microRNA signature promote lymphatic metastasis and invasion of COAD patients
We put 78 miRNA-target genes upregulated in N stage or lymphatic invasion into STRING to explore potentially biological signaling pathways correlated with the ten-microRNA signature. As shown in Fig. 8, the GO analysis results showed that these target genes were significantly enriched in the Wnt signaling pathway. The KEGG analysis result showed that these target genes were significantly associated with Hippo signaling pathway, MAPK signaling pathway, PI3K-Akt signaling pathway, mTOR signaling pathway, Wnt signaling pathway and so on. Hence, we speculated that these target genes promote progression of COAD patients mainly through Wnt signaling pathway.

Discussion
COAD is part of colorectal cancer which is a highly heterogeneity of disease with molecular complexity [10]. This feature makes traditional clinical predictive indicators such as stage and so on not enough to predict survival for patients. Although plenty of prognosis biomarkers of COAD has been reported [2,11], most of them are lacked accuracy and have not been validated in clinical practice. Therefore, novel biomarkers are urgently needed to provide more accurate survival prediction for COAD patients. During the past years, microRNAs are regarded as novel disease biomarkers because of the universality and stability in cancer [12]. Therefore, in this study, we identified a ten-microRNA signature to better predict the prognosis of COAD patients. Among those selected microRNAs, seven of them (hsa-miR-32, hsa-miR-197, hsa-miR-210, hsa-miR-3189, hsa-miR-3917, hsa-miR-4999, and hsa-miR-6854) were previous reported to have critical roles in colorectal carcinoma [13][14][15][16]. Three of them (hsa-miR-3189, hsa-miR-3917 and hsa-miR-6854) appear to be protective factors for colon cancer consistent with previous study and four of them (hsa-miR-32, hsa-miR-197, hsa-miR-210 and hsa-miR-4999) display to be risky factors for colon cancer also consistent with previous study. The consistency with the reported microRNAs indicated that the selected ways in our study were reasonable. The left three selected microRNAs (hsa-miR-887, hsa-miR-3199-2 and hsa-miR-561) have not been identified to be correlated with the prognosis of colon cancer, but these microRNAs have been reported to be correlated with the development, progression, and prognosis of other cancers. For instance, has-miR-887 was reported to be associated with the metastasis and progression of prostate cancer [17]. has-miR-3199-2 was found to be a prognostic biomarker for papillary renal cell carcinoma [18], and hsa-miR-561 was reported to inhibit gastric cancer cell proliferation and invasion by downregulating c-Myc expression [19]. These results indicated that these microRNAs might also play an important role in colon cancer. In addition, among our screened microRNAs, miR-32, miR-4999, miR-3189, miR-6854, miR-561 and miR-210 were significantly upregulated in COAD patients (Additional File 1: Figure  S1a, b), suggesting these microRNAs might exhibit diagnostic value in COAD.
By applying this microRNA signature to the patients, a significantly risk stratification for patients' outcome was observed between survival curves of patients with highrisk or low-risk scores. Patients in the high-risk group have a significantly shorter survival and poor prognosis than those in the low-risk group. These results suggested that prognostic value of the microRNA signature is robust and reliable for survival prediction in ccRCC patients. Besides, univariate and multivariate Cox analysis results demonstrated that our microRNA signature is an independent risk factor for prognosis of COAD patients.
At present, although several biomarkers including secretory proteins, mRNA and lncRNA signatures have been used for diagnosis and prognosis in a variety of cancers, they still have themselves limits respectively [20][21][22]. Generally, compared with protein biomarkers, RNA biomarkers are more sensitivity and specificity, whose cost are lower. These results provide a good idea for us to research early diagnosis and prognosis monitoring of patients from the perspective of RNA. Regrettably, mRNA and lncRNA is easily degraded during extraction. However, miRNA as an alternative biomarker have a promising clinical value in diagnosis and prognosis monitoring of patients. More importantly, our microRNA signature have higher accuracy for OS than mRNA, linRNA and other miRNA signature. Therefore, our study provide a novel biomarker with superiority to predict the prognosis of COAD patients.
Tumor metastasis is the chief cause of death in the vast majority of cancer patients including COAD and is indicative for poor prognosis [23,24]. However, it is strongly correlated with initial tissue invasion at the primary tumor site [25]. Therefore, it is need to point out that novel therapeutic strategies still need to explore to Fig. 7 Kaplan-Meier analysis of miRNA-target genes associated with N stage and lymphatic invasion of patients. a Volcano plot of differentially expressed genes in COAD patients with N0 + N1 stage vs N2 stage. b Volcano plot of differentially expressed genes in COAD patients with non-lymphatic invasion vs lymphatic invasion. c The intersection of miRNA target genes predicted using Targetscan database and miRDB database. d The intersection of miRNA target genes, upregulated in N stage and upregulated in lymphatic invasion. e-g Kaplan-Meier analysis of miRNA-target genes upregulated both in N stage and lymphatic invasion. h-p Kaplan-Meier analysis of miRNA-target genes upregulated in N stage avoid metastasis. In our study, high risk score indicated advanced TNM stage, higher N stage and lymphatic invasion, suggesting our signature could affect the progression of COAD patients. Furthermore, we identified some miRNA-target genes that promote lymphatic metastasis and invasion and associate with survival of patients. These target genes might exhibit a helpful indicator for lymphatic metastasis and invasion and prognosis of COAD patients. Among these target genes, ERBB4 has been reported to be associated with colon cancer metastasis, which was consistent with our results [26]. The left of target genes might be novel potential therapeutic targets for patients with lympghatic metastasis and invasion.
Wnt family genes play essential roles in human tumorigenesis. The Wnt signaling pathway could involve in cell proliferation, migration and fate during embryonic development. In adulthood, Wnt signaling pathway exhibit a critical role in regulating homeostasis and self-renewal of tissues. Particularly, in the intestinal epithelium, Wnt signaling pathway promotes proliferation and/or differentiation of stem cells in the intestinal crypts. This is why Wnt signaling pathway play an important role in colon carcinoma [27]. In our study, the GO and KEGG analysis were mainly enriched in Wnt signaling pathway. Hence, we believed that these miRNA-target genes might affect the progression of COAD patients through Wnt signaling pathway. In addition, a growing number of studies suggest that combined molecular biomarkers with clinical characteristics are able to improve the prognostic accuracy for patients than a single biomarker. Therefore, we incorporated our signature with N stage and lymphatic invasion, which resulted in improved predictive accuracy in OS of COAD patients compared to using signature alone. Thus, our microRNA signature could serve as a help indicator to predict prognosis of patients, especially when combined with N stage and lymphatic invasion.
However, there are also some limitations in this study. First, the prognostic signature of microRNA-based expression were identified by reasonable statistical approaches, but the results was only verified in TCGA database and not verified in clinical practice. Then, although we have some clinical characteristics information in this analysis, several important factors including alcohol consumption, food style, smoking history, and treatment information (surgery, chemotherapy, and radiotherapy) were not available in TCGA database, and we could not control those factors that might cause biases in our analysis. Final, The large scale studies are needed to exam our signature before the ten-microRNA signature can be applied in the clinical practice.

Conclusions
In this study, we established a ten-microRNA signature as a novel superior and independent indicator to accurately predict prognosis of COAD patients. Furthermore, we identified some miRNA-target genes that affect lymphatic metastasis and invasion and prognosis of COAD patients. These miRNA-target genes were able to be novel therapeutic therapy for patients with lymphatic metastasis and invasion. Finally, we found Wnt signaling pathway might be the significant mechanism in affecting the progression of COAD.