Integrative analysis of ceRNA network reveals functional lncRNAs associated with independent recurrent prognosis in colon adenocarcinoma

Background Long non-coding RNAs (lncRNAs), acting as competing endogenous RNA (ceRNA) have been reported to regulate the expression of targeted genes by sponging miRNA in colon adenocarcinoma (COAD). Methods However, their potential implications for recurrence free survival prognosis and functional roles remains largely unclear in COAD. In this study, we downloaded the TCGA dataset (training dataset) and GSE39582 (validation dataset) of COAD patients with prognostic information. Results A total of 411 differentially expressed genes (DElncRNAs: 12 downregulated and 43 upregulated), 18 DE miRNAs (9 downregulated and 9 upregulated) and 338 DEmRNAs (113 downregulated and 225 upregulated) were identified in recurrence samples compared with non-recurrence samples with the thresholds of FDR < 0.05 and |log2FC|> 0.263. Based on six signature lncRNAs (LINC00899, LINC01503, PRKAG2-AS1, RAD21-AS1, SRRM2-AS1 and USP30-AS1), the risk score (RS) system was constructed. Two prognostic clinical features, including pathologic stage and RS model status were screened for building the nomogram survival model. Moreover, a recurrent-specific ceRNA network was successfully constructed with 2 signature lncRNAs, 4 miRNAs and 113 mRNAs. Furthermore, we further manifested that SRRM2-AS1 predicted a poor prognosis in COAD patients. Furthermore, knockdown of SRRM2-AS1 significantly suppressed cell proliferation, migration, invasion and EMT markers in HT-29 and SW1116 cells. Conclusion These identified novel lncRNA signature and ceRNA network associated with recurrence prognosis might provide promising therapeutic targets for COAD patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12935-021-02069-6.

identification of valuable molecular markers associated with recurrence may guide early prediction and treatment of COAD patients.
Long non-coding RNAs (lncRNAs) is known as a class of RNA molecules with over 200 nucleotides in length and have no evident open reading frames without non-protein coding ability [9]. It has been certified that lncRNAs are dysregulated in the progression of various cancers and play vital roles in gene regulation and carcinogenesis, including proliferation, migration and genomic stability [10,11]. Accumulating evidence has confirmed the oncogenic or tumor suppressive role of lncRNAs in colon cancer development. For instance, lncRNA CASC15 promotes colon cancer cell proliferation and metastasis by regulating the miR-4310/ LGR5/Wnt/β-catenin signaling pathway [12]. LncRNA B3GALT5-AS1 suppressed colon cancer liver metastasis via its binding on miR-203 promoter and the repression of miR-203 [13]. Recently, several studies based on the development of bioinformatics have developed lncRNArelated signatures for predicting prognosis of colon cancer. As reported by Fu et al. [14], a seven-lncRNA signature associated with prognosis of COAD was identified and validated by different cohorts. Lin et al. [15] developed an immune-related nine-lncRNA signature predictive of overall survival in colon cancer. In addition, Zhou et al. [16] identified ten prognostic autophagy-related lncRNAs, which made up an autophagy-related lncRNA signature as therapeutic targets for the COAD patients. Nevertheless, the research focused on functional lncR-NAs associated with independent recurrent prognosis still remain relatively little.
A key regulatory mechanism for lncRNAs is the competitive endogenous RNA (ceRNA) hypothesis which bind to microRNA (miRNAs) through miRNA response elements (MREs), thereby regulating miRNAs-induced gene silencing [17]. Based on ceRNA theory, Wu et al. [18] revealed lncRNA MALAT1 may serve as a competing endogenous lncRNA (ceRNA) to mediate HMGB1 by sponging miR-129-5p in colon cancer. Liu et al. [19] demonstrated that SNHG17 serves as competing endogenous RNA (ceRNA) for miR-375 to regulate CBX3 expression in COAD. We thus believe lncRNAs functions as ceRNAs deserve further exploration in exploring the molecular mechanism underlying COAD recurrent prognosis.
In present research based on TCGA and GEO database, we analyzed the differentially expressed RNAs (DERs), including DElncRNAs, DEmiRNAs and DEmRNAs between recurrence and non-recurrence COAD samples. By screening independent recurrence prognosis-related lncRNAs, we constructed risk score system and nomogram survival model. Moreover, we established ceRNA regulatory network associated with lncRNA signature and conducted enrichment analysis to elucidate the interactions and valid potential crosstalk between RNAs.

Microarray datasets
Gene expression profiles and corresponding clinical data of COAD patients (Platform: Illumina HiSeq 2000 RNA Sequencing) were downloaded from The Cancer Genome Atlas (TCGA, http:// cance rgeno me. nih. gov/) on April 20, 2019, including 465 tumor tissues and 85 normal tissues. After matching with 461 miRNA expression profiles downloaded at the same time (Illumina Hiseq 2000 RNA Sequencing), a total of 363 COAD samples with recurrence information (78 recurrence and 285 non-recurrence) constituted the training dataset. For validation propose, we searched microarray data from Gene Expression Omnibus (https:// www. ncbi. nlm. nih. gov/ geo/) database with the following criteria: 1) Tumor solid tissue samples from patients with COAD; 2) The sample size more than 200; 3) The COAD tumor samples had clinical information on the prognosis of recurrence. Finally, microarray data of GSE39582 including 536 COAD patients with recurrence information were obtained under the platform GPL570 Affymetrix Human Genome U133 Plus 2.0 Array.

Analysis of differential expressed RNAs (DERs)
According to downloaded RefSeq ID information from the training set and the validation set, the lncRNAs and mRNAs of the two sets were annotated based on HUGO Gene Nomenclature Committee (HGNC) [20] (http:// www. genen ames. org/), the database which records information of 4120 lncRNAs and 19,198 protein-coding genes. Then, differentially expressed DERs, including DElncRNAs, DEmiRNAs and DEmRNAs were screened between recurrence and non-recurrence specimens using the limma package (software version 3.34.7) of R [21]. False discovery rate (FDR) < 0.05 and |log 2 fold change (FC)|> 0.263 (FC > 1.2) were set as the cutoff for significance. Two-way hierarchical clustering analysis based on Centered Pearson Correlation Algorithm [22] was carried out for these identified DERs by pheatmap Version 1.0.8 in R3.4.1 language [23].

Construction and validation of prognostic predictive model
Using survival package Version 2.41-1 in R3.4.1 language [24], these DElncRNAs were subjected to univariable and multivariable Cox regression proportional hazards regression analysis to select the independent risk DElncRNAs and obtain corresponding coefficients in training dataset with log-rank p value < 0.05. Next, least absolute shrinkage and selection operator (LASSO) Cox regression model [25] in penalized package Version 0.9.50 of R3.4.1 language [26] was used to screen independent risk signature lncRNAs (The optimized parameter "lambda" in the screening model is obtained by the cross-validation likelihood (CVL) cycle calculation of 1000 times). By linearly combining the expression value of selected signature lncRNAs weighted by their coefficients, a risk-score (RS) formula was constructed as following: RS = ∑β lncRNA × Exp lncRNA , where β lncRNA indicates the coefficient and Exp lncRNA indicates the expression level of signature lncRNA. The RS of every patient from the TCGA and GEO cohorts were calculated based on the signature. The subjects in each dataset were classified into a high-risk group and low-risk group with the median score as cut-off value. We used the Kaplan-Meier method with log-rank test in R3.4.1 survival package Version 2.41-1 [24] to perform recurrence free survival (RFS) analysis for each set. The receiver operating characteristic (ROC) curve and the area under the curve (AUC) were drawn using R package "survival ROC" and utilized to validate the prediction model.

Construction of nomogram survival model for independent prognostic factor
We first screened the independent prognostic factors in the training dataset with the significance threshold of log-rank p < 0.05 by performing the univariate and multivariate regression analysis in the survival package of R3.4.1 (version 2.41-1). Afterwards, the identified independent prognostic factors were combined with the predicted risk information in the prediction prognosis model to construct a nomogram 3-year or 5-year survival rate model using R3.4.1 rms package Version 5.1-2 (https:// cran.r-proje ct. org/ web/ packa ges/ rms/ index. html) [27,28]. Finally, we compared the actual and predicted probabilities of 3-year RFS and 5-year RFS using calibration plots.

Construction of ceRNA regulatory network
We first used DIANA-LncBasev2 database (http:// carol ina. imis. athena-innov ation. gr/ diana_ tools/ web/ index. php?r= lncba sev2% 2Find ex-exper iment al) [29] to predict the interactions between six-lncRNA signature and DEmiRNAs. Secondly, starBase version 2.0 database (http:// starb ase. sysu. edu. cn/) [30] was searched to predict the corresponding target mRNAs of selected DEmiR-NAs. Meanwhile, the regulatory relationships included in at least one of five databases (TargetScan, PicTar, RNA22, Pita and MIRANDA) were selected as target mRNAs. Then, the DEmRNAs were corresponded into the target mRNAs. Only the regulatory pairs of DElncRNAs and DEmiRNAs, DEmiRNAs and DEmRNAs had opposite expressions and therefore included in the present study.
Finally, the selected interaction of DEmiRNAs and DEm-RNAs and of DElncRNAs and DEmiRNAs were integrated to construct the ceRNA regulatory network using Cytoscape 3.6.1 visualization software (https:// cytos cape. org/) [31].

Function enrichment analysis
To better understand the function of DERs in ceRNA network, we performed Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses based on gene set enrichment analysis (GSEA: http:// softw are. broad insti tute. org/ gsea/ index. jsp) [32]. With the nominal p < 0.05 as the cut-off criteria, we chose signature lncR-NAs with significant enrichment and displayed gene sets enrichment plots.

Clinical tissues
Total 60 paired tumor tissues and adjacent normal tissues from COAD patients were collected from Harbin Medical University Cancer Hospital (Heilongjiang, China) after two pathologists independently diagnosed the pathological features of the tumor tissues. Some basic clinicopathological characteristics of COAD patients, including age, gender and TNM stage, as well as follow-up information were recorded. None of patients received any neoadjuvant chemotherapy or radiotherapy before operation. The study was approved by the Ethics Committee of the Harbin Medical University Cancer Hospital with signed written informed consent by all subjects.

Cell culture and transfection
Four COAD cell lines (HT-29, DLD-1, SW1116 and RKO) and normal human colon epithelial cell line (FHC) were purchased from American Type Culture Collection (ATCC, Manassas, VA, USA). All cell lines were cultured in DMEM medium (Gibco, Carlsbad, CA) with 10% fetal bovine serum (FBS, Gibco) in a humidified atmosphere of 37 °C containing 5% CO 2 .

Cell proliferation assay
Transfected cells at a density of 5,000 cells per well were seeded in a 96-well plate and 10 μl Cell Counting Kit-8 reagent (Dojindo Molecular Technologies, Inc.) was added to each well. After incubated at different times (0, 24, 48 and 72 h), the absorbance at 450 nm was measured to represent cell proliferation status.

Cell migration and invasion assays
For cell migration assay, 200 μL serum-free medium of transfected cells (8 × 10 4 ) was placed in the upper inserts of transwell chamber (8 μm pore size, Corning, Corning, NY, USA). Meanwhile, 700 μL of medium containing 10% FBS was added into the lower inserts of transwell as a chemical attractant. After 24 h incubation, the migratory cells in lower chamber were fixed in methanol and stained with 0.1% crystal violet, followed by cell counting in five randomly selected fields under the microscope. For cell invasion assay, the procedure of invasion assay was similar with that of migration assay, except for the transwell chamber precoated with Matrigel (BD Biosciences).

Western blot analysis
Protein from cell lines were extracted using 1 × cell lysis buffer (Promega, Madison, WI, USA). The SDS-PAGE electrophoresis and immunoblotting were performed as previously reported [33] with antibodies specific for E-cadherin, Vimentin, Snail, glyceraldehyde-3-phosphate dehydrogenase (GAPDH; Santa Cruz Biotechnology, Santa Cruz, CA, USA).

Statistical analysis
The statistical analysis software performed was Graph-Pad Prism 6.0 (GraphPad Software, Inc., USA). The chisquare test was used to assess the relationship between SRRM2-AS1 expression and clinicopathological features of COAD patients. Overall survival was analyzed by the Kaplan-Meier method and compared by the log-rank test. The significance of various variables for survival data was evaluated by univariate and multivariate Cox regression models. Quantitative data were expressed as mean ± SD and analyzed for significant difference in two groups by Student's t test. All data with p values less than 0.05 were recognized as statistically significant.

Identification of DERs
Following data annotation, we obtained 549 miRNAs, 12,008 mRNAs and 827 lncRNAs overlapped by the TCGA set and the validation set. Based on the screening criteria of |log 2 FC|> 0.263 and FDR < 0.05, DERs, including 18 DEmiRNAs (9 up-regulated and 9 downregulated), 338 DEmRNAs (225 up-regulated and 113 down-regulated) and 55 DElncRNAs (43 up-regulated and 12 down-regulated) were identified in recurrence samples compared with non-recurrence samples (Additional file 1: Table S1). Volcano plot and bidirectional hierarchical clustering heatmap were described to the DEmiRNAs (Fig. 1A), DEmRNAs and DElncRNAs (Fig. 1B), which clearly indicated the samples tend to cluster in two distinct directions.

Construction and validation of 6 lncRNA-based prognostic signature
For the training set, univariate Cox proportional hazards regression analyses revealed 42 DElncRNAs significantly correlated with overall survival among the 50 differentially expressed lncRNAs. Stepwise multivariable Cox proportional hazards regression analyses identified 7 independent risk DElncRNAs. These 7 DElncRNAs were entered into LASSO-based Cox-PH model. Under the parameter (lambda value is 0.8139 and the maximum value of CVL is -459.9254) (Fig. 2A), a total of 6 signature lncRNAs were found to be significantly and independently related to prognosis ( Table 1). The gene prognostic coefficient was shown in Fig. 2B. We then constructed a prognostic signature based on the expression levels of these 6 signature lncRNAs and their coefficients derived from the multivariable Cox model. The RS of each patient in the training and validation datasets was calculated using the formula: RS = (1.0109537) × Exp LINC00899 + (0.6383124) × Exp LINC01503 + (-0.499666) × Exp PRKAG2-AS1 + (-1.977238) × Exp RAD2 1-AS1 + (2.0163898) × Exp SRRM2-AS1 + (-0.386197) × Exp U SP30-AS1 . According to the median risk score, the training and validation datasets were divided into a highrisk group and a low-risk group. Kaplan-Meier analysis (Fig. 3) revealed that the high-risk group had a significantly poorer RFS prognosis than that of the low-risk group in training dataset (log-rank p = 1.062e-05) and validation dataset (log-rank p = 1.824e-02). To evaluate the performance of the 6-lncRNA signature for predicting the prognosis of COAD patients, the ROC curve and the area under the ROC curve (AUC) were drawn. Meanwhile, the area under the AUC for the 6-lncRNA signature was 0.972 in training dataset and 0.914 in validation dataset, which indicated good performance.

Building nomogram based on prognostic clinical factors and the six-lncRNA signature
Using univariate and multivariate regression analysis, we found that pathologic stage and RS model status were independent prognostic factors in the training dataset, as summarized in Table 2 and Additional file 2: Figure S1. Combining expression risk score based on the pathologic stage and RS model status, we constructed a nomogram to improve predictive accuracy (Fig. 4A). As shown in calibration plots (Fig. 4B), the predicted 3-year and 5-year RFS was consistent with the actual 3-year and 5-year RFS.

Construction of ceRNA regulatory network
To explore the targeting relationship of the DERs, we focused on the interaction of 18 DEmiRNAs with 6-lncRNA signature and DEmRNAs. Firstly, we explored the regulatory loops with lncRNA-miRNA in the DIANA-LncBasev2 database and found that 2 of 6 specific DElncRNAs might target 4 of 18 specific DEmiR-NAs. Subsequently, we screened 137 connection pairs between 4 DEmiRNAs and selected DEmRNAs predicted by (TargetScan, PicTar, RNA22, Pita and MIRANDA) (Additional file 3: Table S2). Finally, on account of the regulatory pairs of lncRNA signature-DEmiRNA and DEmiRNA-DEmRNA, we constructed the lncRNA-miRNA-mRNA ceRNA network using Cytoscape 3.6.1 software. In total, 2 lncRNAs, 4 miRNAs, and 113 mRNAs were included in the ceRNA regulatory network, containing 117 nodes and 137 edges (Fig. 5).

Function enrichment analysis
According to the GSEA-based KEGG signaling pathway enrichment analysis in DERs from ceRNA network, we screened the KEGG pathways that were significantly correlated with two signature lncRNAs using NOM P value less than 0.05 as the threshold. As shown in enrichment plots, two KEGG signaling pathways (PPAR_SIGN-ALING_PATHWAY and CYTOKINE_CYTOKINE_ RECEPTOR_INTERACTION) were screened to be significantly associated with PRKAG2-AS1 and SRRM2-AS1 (Additional file 4: Figure S2).

SRRM2-AS1 was upregulated in COAD and high SRRM2-AS1 level predicted worse prognosis
We initially validated the expression of PRKAG2-AS1 and SRRM2-AS1 in 60 pairs of tumor tissues and adjacent tissues derived from COAD patients. The results from quantitative RT-PCR showed that PRKAG2-AS1 (Fig. 6A) was downregulated, while SRRM2-AS1 (Fig. 6B) was upregulated in tumor tissues compared with adjacent tissues, which represented the same results by analyzing the TCGA-COAD dataset. Considering the relatively higher fold change in SRRM2-A1 expression, SRRM2-AS1 was selected for further analysis. By dividing all patients into high and low expression group with the median value of SRRM2-AS1 as a cutoff value, we found high level of SRRM2-AS1 was significantly associated with tumor size, lymph node metastasis and TNM stage (Table 3). Kaplan-Meier analysis revealed that patients with low-level SRRM2-AS1 had better overall survival than those with high-level SRRM2-AS1 (Fig. 6C, log-rank  test; p = 0.002). More importantly, high SRRM2-AS1 expression was identified as an independent unfavorable prognostic factor in COAD patients through univariate and multivariate Cox regression analysis (Table 4). These results indicated that SRRM2-AS1 may be involved in the malignant progression of COAD.

Knockdown of SRRM2-AS1 suppressed the COAD cell proliferation, migration and invasion
To further explore the biological function of SRRM2-AS1 on COAD in vitro, we first determined the expression of SRRM2-AS1 in several COAD cell lines. As shown in Fig. 7A, primary COAD cell lines (HT-29, DLD-1, SW1116 and RKO) expressed higher SRRM2-AS1 levels compared with normal human colon epithelial cell line (FHC). Next, si-SRRM2-AS1 was transfected into HT-29 and SW1116 cells, which significantly suppressed the expression of SRRM2-AS1 (Fig. 7B). Knockdown of SRRM2-AS1 significantly inhibited the proliferation ability of HT-29 and SW1116 cells (Fig. 7C). In addition, the migratory (Fig. 7D) and invasive (Fig. 7E) capacities of HT-29 and SW1116 cells were also repressed after si-SRRM2-AS1 transfection, in comparison with si-NC transfection. These results indicated that SRRM2-AS1 can promote the growth and metastasis of COAD in vitro.

Exploration on the molecular mechanism underlying SRRM2-AS1 knockdown on COAD cells
According to the predicted ceRNA regulatory network, we selected several miRNAs and mRNA targets of SRRM2-AS1 to analyze their expression levels under SRRM2-AS1 knockdown in COAD cells. The results  showed that knockdown of SRRM2-AS1 significantly upregulated the expression levels of miR-6514 (Fig. 8A) and miR-1275 (Fig. 8B), while downregulated STRADA mRNA level (Fig. 8C) in both HT-29 and SW1116 cells.
In addition, we measured the protein levels of EMT markers. As shown in Fig. 8D, SRRM2-AS1 knockdown increased E-cadherin expression, while decreased the protein expression of Vimentin and Snail in HT-29 and SW1116 cells.

Discussion
With the development of high-throughput sequencing technology, increasing amounts of sequencing data have been used for studies of cancer diagnosis, therapy, and prognosis [34]. Here, we downloaded RNA-seq data and relevant clinical data related to COAD from TCGA database and obtained 363 COAD samples with recurrence information. A total of 411 DERs, including 18 DEmiR-NAs (9 up-regulated and 9 down-regulated), 338 DEm-RNAs (225 up-regulated and 113 down-regulated) and 55 DElncRNAs (43 up-regulated and 12 down-regulated) were identified in recurrence samples compared with non-recurrence samples. Subsequently, the correlations between DElncRNAs and RFS prognosis were identified to establish a RS model for predicting COAD recurrent prognosis. Accordingly, a 6-lncRNA signature risk prediction model (LINC00899, LINC01503, PRKAG2-AS1, RAD21-AS1, SRRM2-AS1 and USP30-AS1) was By searching published articles on these six-lncRNA signature in tumor development, we found that LINC00899 is elevated in the serum and bone marrow of acute myeloid leukemia (AML) patients and high serum LINC00899 expression was an independent prognostic marker of poor outcome [35]. Dong et al. [36] further demonstrated that LINC00899 promoted cell proliferation and inhibited apoptosis in AML cells. LINC01503 expression level was significantly up-regulated, correlated with poor prognosis and conferred oncogenic functions in glioma [37], hepatocellular carcinoma [38], gastric cancer [39] and especially colorectal cancer [40]. Silencing of PRKAG2-AS1 alleviated castration-resistant prostate cancer (CRPC) tumor growth, showing repression of androgen receptor (AR) and AR variant expression [41]. In addition, RAD21-AS1 [42], SRRM2-AS1 [43] and USP30-AS1 [44,45] have been reported to be survival prognosis in various cancers. These suggested that identified 6-lncRNA signature might be involved in the recurrence prognosis of COAD.
On the basis of the identified lncRNA signature, DEmiRNAs and DEmRNAs, we constructed lncRNA-miRNA-mRNA ceRNA network of COAD, of which 2 lncRNAs, 4 miRNAs, and 113 mRNAs were included. Notably, SRRM2-AS1 binds to the target miR-1275, miR-6514 and miR-3130, while PRKAG2-AS1 binds to the target miR-105 (Additional file 4: Fig. S2). These data indicated that PRKAG2-AS1 and SRRM2-AS1 might play roles in the recurrence prognosis of COAD by regulating their corresponding target miRNAs. Except for miR-6514 and miR-3130, accumulating evidence has revealed the functional roles of miR-1275 [46,47] and miR-105 [48,49] in the development of tumorigenesis. The function enrichment analysis identified the PPAR_SIGNALING_ PATHWAY and CYTOKINE_CYTOKINE_RECEP-TOR_INTERACTION significantly associated with PRKAG2-AS1 and SRRM2-AS1, which might be the recurrence prognosis of COAD. Consistent with our findings, Jansson et al. [50] reported that peroxisome proliferator activated receptor gamma (PPAR gamma) protein levels are elevated, possibly through interaction with beta-catenin and T cell transcription factor-4 in colon cancer cell lines. CYTOKINE_CYTOKINE_RECEP-TOR_INTERACTION pathway has been reported to promote tumor progression in models of colorectal cancer, which predicts unfavorable outcomes in colon cancer patients [51,52].
Moreover, we harvested 60 pairs of COAD tissues and adjacent tissues. After a series of clinical analysis, we confirmed that high SRRM2-AS1 expression was identified as an independent unfavorable prognostic factor in COAD patients. Functional experiments further manifested that knockdown of SRRM2-AS1 suppressed the cell proliferation, migration and invasion in two COAD cell lines (HT-29 and SW1116). Moreover, SRRM2-AS1 knockdown suppressed the epithelial-to-mesenchymal transition process, as reflected by increased E-cadherin expression and decreased Vimentin and Snail expression levels. Similar to our study design, HOTAIR depletion could reduce cellular motility, invasiveness and EMT in human tumor cells [53]. Furthermore, silencing of SRRM2-AS exerts suppressive effects on angiogenesis in nasopharyngeal carcinoma [54]. As our best knowledge, SRRM2-AS1 has not been explored on its biological function in tumor cells, except for a recent study by Yang et al. [43] who also identified SRRM2-AS1 as an independent prognosis-associated lncRNA for predicting the recurrence of COAD patients. Anyway, our experiment validation is far from adequate, including lacking of deeper functional analysis on these miRNAs, their interactions and in vivo animal experiments and deeper molecular mechanism exploration.

Conclusion
In summary, we identified differentially expressed gene associated with the recurrent prognosis of COAD patients and constructed a 6 lncRNA prognostic model to predict prognosis of patients. The prognostic model presented a good performance in 3-and 5-year RFS prediction. Importantly, we have successfully constructed a lncRNA-associated ceRNA network, which provides novel lncRNAs as candidate potential therapeutic targets for associated with independent recurrent prognosis in COAD.