Identification of prognostic factors for intrahepatic cholangiocarcinoma based on long non-coding RNAs-associated ceRNA network

Background Accumulating evidence has highlighted the important roles of long non-coding RNAs (lncRNAs) acting as competing endogenous RNAs (ceRNAs) in tumor biology. However, the roles of cancer long non coding RNAs (lncRNAs) in lncRNA-related ceRNA network of intrahepatic cholangiocarcinoma (ICC) remain enigmatic. The current study aims to identify prognostic factors in the lncRNA-related ceRNA network of ICC. Methods The transcriptome sequencing data of the lncRNAs, messenger RNA (mRNA) and microRNA (miR) were downloaded from SRA, and TCGA database respectively. Differentially expressed lncRNAs (DElncRNAs), DEmiRs and DEmRNAs were identified and adopted to construct lncRNA-miR-mRNA ceRNA network. ICC-associated DEmRNAs were adopted to construct the PPI network. The expression of the top 6 genes in the hub module was validated in mRNA transcriptome sequencing data and ICC-related gene expression dataset GSE45001, followed by GO and KEGG pathway enrichment analysis. The potential function of genes in hub module in the ceRNA network was finally subjected to GSEA. Results Sixty coexpression DEmRNAs were identified in the ceRNA network. The 6 top genes in the hub module consists of FOS, HGF, IGF2, FOXO1, NTF3 (downregulated), and IGF1R (upregulated) in ICC tissues compared to normal adjacent tissues, followed by the successful construction of lncRNA-miR-hub network with 86 ceRNA modules. FOS, IGF2 or IGF1R might regulate the development of ICC by targeting “N-glycan biosynthesis”, “α-linolenic acid metabolism” or “type II diabetes” pathways respectively. Conclusion These results show FOS, IGF2 and IGF1R serve as prognostic factors of ICC patients. underlying mechanisms of FOS and N-glycan biosynthesis

likely to cause locoregional extension of ICC, resulting in dismal prognosis [5]. Identifying prognostic factors for ICC is therefore critical to effective treatments for ICC.
The methylation of DLEC1 cilia and flagella associated protein is engaged in a favorable clinical outcome and prognosis in patients with small duct ICC [6]. However, CD90 expression contributed to lymph node metastasis and thereby dismal prognosis in human ICC [7]. Expression of VEGFR-3 in ICC also promotes the angiogenesis of lymph, thereby resulting in unfavorable prognosis [8]. Long noncoding RNAs (lncRNAs) are a multiple class of RNAs engaging in various biological processes [9].
LncRNAs also serve as both oncogene and suppression, making them target for tumorigenesis [10].
Competing endogenous RNA (ceRNA) is a newly-emerged regulatory network on a large scale across the transcriptome with expansion to the genetically functional information in the human genome and exerts pivotal functions on cancer [11]. Interestingly, ceRNAs indicate a novel modulation on lncRNAs and genes that exert pivotal functions on cancer pathogenesis by binding microRNAs (miRs) in cholangiocarcinoma [12].
In the current study, we adopted the transcriptome sequencing data of the lncRNAs, messenger RNA (mRNA) and microRNA (miR) from Sequence Read Archive (SRA), and Cancer Genome Atlas (TCGA) database respectively. Differentially expressed lncRNAs (DElncRNAs), DEmiRs and DEmRNAs were identified and adopted to construct lncRNA-miR-mRNA ceRNA network. ICC-associated DEmRNAs were adopted to construct the PPI network. The expression of identified the top 6 genes in the hub module by PPI network, followed by GO and KEGG pathway enrichment analysis. The potential function of genes in hub module in the ceRNA network was finally subjected to Gene Set Enrichment Analysis (GSEA). Thus, the purpose of this study was to identify genetic alterations that underlie the prognostic factors of ICC.

Data collection and preprocessing
The original transcriptome sequencing data of the lncRNAs from human ICC and adjacent normal tissues were retrieved until September 2019 from SRA database (https://www.ncbi.nlm.nih.gov/sra/) and then subjected to SRP126672 dataset. SRP126672 was composed of 30 ICC tissues and 27 4 adjacent normal tissues. Fastqc and trimmomatic applications were adopted to control quality and filter the original sequencing data. LncRNAs were quantified based on the Genome Research Project of ENCyclopedia of DNA Elements (GENCODE) (GRCh37) catalog (http://www.gencodegenes.org/). The RNA transcriptome sequencing data for ICC were downloaded from the Cancer Genome Atlas (TCGA) database. MiR sequencing and mRNA sequencing data were downloaded using a data transfer management tool (provided by GDC Apps) (https://tcga-data.nci.nih.gov/). Among them, miR sequencing data and mRNA sequencing data consisted of 33 ICC tissues and 8 normal tissues respectively. In addition, the gene expression GSE26566 dataset for ICC downloaded from the Gene Expression Omnibus (GEO) database included 10 samples of ICC and 10 non-tumor liver samples to test the expression level of the hub gene.

Identification of DEGs
The original data of ICC tissues and adjacent normal tissues RNA-seq were corrected, normalized and their expression was calculated. Then, differentially expressed lncRNAs (DElncRNAs) were screened by the DESeq2 package. The standard is |log 2 (fold change [FC])| > 2 and adjusted p < 0.05. In addition, the edgeR software package was used to screen differentially expressed miRs (DEmiRs) and mRNAs (DEmRNAs) with thresholds |log 2 (fold change [FC])|> 2 and FDR < 0.01.
Construction of lncRNA-miR-mRNA ceRNA network The ceRNA network was constructed based on the DElncRNAs, DEmiRs and DEmRNAs. Briefly, following the identification of DElncRNAs, DEmiRs and DEmRNAs, the miRcode (http://www.mircode.org/) database was adopted to predict the markedly downregulated lncRNAs targeted by upregulated miRs and markedly elevated lncRNAs targeted by repressed miRs in ICC.
DEGs with correct trend and targeted relationships serve as candidate genes. Next, the TargetScan (http://www.targetscan.org/), miRDB (http://www.mirdb.org/) and miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/php/index. php) were adopted to predict markedly downregulated mRNAs targeted by upregulated miRs and markedly elevated mRNAs targeted by repressed miRs in ICC. The predicted mRNAs were selected by intersection among these databases. DEGs with correct trend and targeted relationships serve as candidate genes. The predicted lncRNA-miR and miR-mRNA 5 was combined to construct the lncRNA-miR-mRNA ceRNA network. Finally, cytoscape v3.6.1 software was adopted to visualize and map the network.

GO and KEGG pathway enrichment analysis
To clarify the potential biological processes of DEmRNAs related to the ceRNA network in the development of ICC, the DAVID database (https://david.ncifcrf.gov/) was used to perform GO enrichment analysis of DEmRNAs by setting the default parameters. The GO function enriched by p < 0.05 was considered significant among transcriptome sequencing data. In order to understand the potential pathways of DEmRNAs involved in the ceRNA network, the KOBAS database (http://kobas.cbi.pku.edu.cn/index.php) was used to perform KEGG pathway enrichment analysis on DEmRNAs and the significance of KEGG pathway was evaluated at p < 0.001.

Construction of the protein-protein interaction (PPI) network and module analysis
The interaction between DEGs identifies key genes in modules involved in the development of ICC with a combined score of > 0.4 for a PPI network as the threshold. Then, the PPI information of DEmRNAs was obtained from STRING database (http://www.string-db.org/) and a PPI network was built using Cytoscape. Finally, the first six key genes in the hub module were obtained from the PPI network using the MCC network topology using the cytoHubba plug-in in Cytoscape.

Validation of key genes in the hub module
The gene expression dataset GSE45001 of ICC from the Gene Expression Omnibus (GEO) database was used to verify the expression of key genes in the hub module. The gene expression dataset GSE26566 was also used hub gene function prediction.

Prediction of gene function in the hub module
To better elucidate the potential function of the key gene in hub modules in the ceRNA network, a GSEA was performed. Briefly, according to the median expression of key genes in the RNA sequencing data, 33 ICC samples from the TCGA database were divided into the high expression group and low expression group. The reference gene set is the annotated c2.cp.kegg.v6.2.symbols.gmt gene set in the Molecular Signature Database (MSigDB). The critical criterion is p < 0.05.

Results
Identification of DEGs between ICC tissues and adjacent normal tissues 6 To identify DEmRNAs, DEmiRs and DElncRNAs in ICC, the SRP126672 was composed of 30 ICC tissues and 27 adjacent normal tissues downloaded from SRA database, followed by Fastqc and trimmomatic applications and then DElncRNAs were screened using the DESeq2 package. A total of 912 DElncRNAs were obtained, with 524 upregulated and 388 downregulated DElncRNAs (Fig. 1A,B). The different expression of DElncRNAs could help distinguish ICC tissues and 27 adjacent normal tissues determined by principal component analysis (Fig. 1C). DEmiR and DEmRNA sequencing data (counts) of ICC and adjacent normal tissues were obtained from the TCGA database, followed by the edgeR package to perform differential analysis. Finally, 66 DEmiRs (38 upregulated and 28 downregulated) In addition, 16457, 7052, and 2081 target mRNAs were predicted from the databases TargetScan, miRDB, and miRTarBase respectively. Then, 10 downregulated candidate DEmiRs were intersected with 2364 downregulated DEmRNAs, yielding 43 shared genes ( Fig. 2A), and 2 downregulated candidate DEmiRs were intersected with 3158 upregulated DEmRNAs, yielding 17 share genes ( Fig. 2B). Next, the shared genes serve as candidate genes and generated 136 miR-mRNA pairs including 43 significantly down-regulated and 17 significantly up-regulated DEmRNAs. A total of 37 DElncRNAs, 12 DEmiRs, and 60 DEmRNAs were used to construct the ceRNA network. Based on these lncRNA-miR pairs and miR-mRNA pairs, a ceRNA network consisting of 37 lncRNA nodes, 12 miR nodes, and 60 mRNA nodes in ICC was constructed (Fig. 2C).
GO and KEGG pathway enrichment analysis on DEmRNAs 7 To identify the biological functions and pathways of the 60 DEmRNAs in the ceRNA network, we used the DAVID database for GO function enrichment analysis and the KOBAS database for KEGG pathway enrichment analysis. GO enrichment analysis showed that DEmRNAs related to biological processes were mainly enriched in GO terms including the insulin receptor signaling pathway, positive regulation of cell proliferation, cell respiration and other items (p < 0.05). DEmRNAs related to cellular component were most closely related to mitochondrial inner membrane (p < 0.05). DEmRNAs related to molecular function were mainly enriched in GO terms of chemoattactant activity, growth factor activity, and transcriptional coactivator binding (p < 0.05) (Fig. 3A). KEGG analysis showed that DEmRNAs were significantly enriched in pathway such as cancer pathways, calcium signaling pathways, cytokin-cytokine receptor interaction, MAPK singling pathway and proteoglycans in cancer (p < 0.001) (Fig. 3B). Coherently, 60 significantly different DEmRNAs exert pivotal role in the occurrence and development of ICC.
Identification of 6 genes in the hub module in PPI network PPI network was constructed based on DEmRNAs, involving 60 nodes and 41 edges (Fig. 4A). To identify the first six genes in the hub module in the PPI network, the MCC network topology in the cytoHubba plug-in was used to determine the intimacy of DEmRNAs. Fos proto-oncogene, AP-1 transcription factor subunit (FOS), insulin like growth factor 1 receptor (IGF1R), hepatocyte growth factor (HGF), insulin like growth factor 2 (IGF2), forkhead box O1 (FOXO1), and neurotrophin 3 (NTF3) in the hub module were obtained (Fig. 4B). Finally, the lncRNA-miR-hub gene subnetwork was constructed (Fig. 4C), including 86 ceRNA regulatory modules.

Validation of FOS, HGF, IGF2, FOXO1, NTF3 and IGF1R expression in ICC
To better validate our analysis, the expression levels of 6 genes in the hub module were in 33 ICC tissues and 8 adjacent normal tissues. Concordant with our previous analysis, results (Fig. 5A) demonstrated that FOS, HGF, IGF2, FOXO1, and NTF3 remarkably downregulated while IGF1R potently rose in ICC tissues (logFC |> 2, FDR < 0.01) compared to adjacent normal tissues. FOS, IGF2, and IGF1R were randomly selected and subjected to the gene expression dataset GSE45001 of ICC, and results exhibited the same trend (Fig. 5B-D). Taken together, these results validated our prediction.
GSEA prediction of biological pathways related to FOS, IGF2, and IGF1R 8 To better identify the biological pathways related to FOS, IGF2, and IGF1R, we classified 33 IHC samples from the TCGA database into the high expression group and low expression group. The reference gene set is the annotated c2.cp.kegg.v6.2.symbols.gmt gene set in the MSigDB. Results revealed that the ICC tissues in the high expression group of FOS, IGF2 or IGF1R were markedly enriched in "N-glycan biosynthesis", "α-linolenic acid metabolism" or "type II diabetes" respectively (all p < 0.01) (Fig. 6A-C), suggesting the regulatory roles of FOS, IGF2, and IGF1R in ICC may be exerted by these pathways.

Discussion
The present study is the preliminary research of the prognostic factors for ICC patients. To identify and gain more insight into the molecules involved in the prognosis of ICC, we analyzed the transcriptome sequencing data of the lncRNAs, mRNA and microRNA miR to construct a lncRNA-miR-mRNA ceRNA network, where which identified 60 coexpression DEmRNAs associated with ICC. Main findings in the current study displayed that the expression of FOS, HGF, IGF2, FOXO1, NTF3 were downregulated, but IGF1R expression was upregulated in ICC tissues compared with normal adjacent tissues. In addition, FOS, IGF2 or IGF1R might regulate the development of ICC by targeting "N-glycan biosynthesis", "α-linolenic acid metabolism" or "type II diabetes" pathways respectively. FOS proteins were characterized by a leucine zipper and a basic region with a helix-turn-helix motif that binds DNA, and serve as an oncogene and transcription factors by binding DNA sequences [13].
C-terminally truncated FOS mutant has been previously reported to be expressed in epithelioid hemangioma [14]. The expression of FOS has been proposed to be positively related to c-Myb expression in colorectal cancer cells, while c-Myb expression is repressed in colorectal cancer tissues, suggesting that expression of FOS is also downregulated in colorectal cancer tissues [15], which is concordant with the current study. Meanwhile, the elevation of FOS-like antigen 1 expression is positively correlated with progression of perihilar cholangiocarcinoma [16]. N-glycan could determine the thermodynamic stability of proteins and exerts pivotal functions on various key biological processes [17]. Interestingly, the repressed genes in gemcitabine resistance-related of ICC belong to the N-glycan biosynthesis pathway [18], though the targeting relationship of FOS and N-glycan 9 biosynthesis pathway has been scarcely documented.
IGFs (IGF1 and IGF2) expedite glucose metabolism with their availability modulated by IGF binding proteins, and function as be prognostic factors for type 1 diabetes [19]. The induction of IGF2 is also partially involved with the proliferation and survival of rhabdomyosarcoma cells [20]. Interestingly, IGF2 has been reported to be methylated in ICC compared to extrahepatic cholangiocarcinoma [21].
IGF-1 has been revealed to be associated with α-linoleic acid in and exert function on porcine primary hepatocytes [22]. α-linolenic acid is a high unsaturated fatty acid content from tree peony species (woody oil crops) [23]. α-linolenic acid metabolism pathway is also engaged in liver cirrhosis [24].
IGF1 maintains the phenotype of tumor and survival of transformed murine pheochromocytoma cells [25]. The inhibition of IR/IGF1R reduced epithelial-mesenchymal transition and cancer stem cell-like traits in resistant cells of cholangiocarcinoma [26]. The elevated expression of IGF1R was observed in tumor necrosis factor-related apoptosis-inducing ligand (TRAIL)-resistant gastric cancer cells, enhancing TRAIL resistance in gastric cancer cells [27]. As genes encoding proteins related to insulin receptor, IGF1R could stimulate renal cancer cells [28]. Type II diabetes mellitus could reduce the risk of cholangiocarcinoma in patients with biliary tract diseases [29].

Conclusion
In conclusion, our research identified several novel genetic alterations and pathway associated with prognosis of ICC, and better understood the potential role of miRs, lncRNAs and mRNAs in the ICC development via bioinformatic analysis. Based on the ceRNA network, we also found that FOS, IGF2 or IGF1R might target "N-glycan biosynthesis", "α-linolenic acid metabolism" or "type II diabetes" pathways respectively, thereby modulating the development of ICC by targeting, which provided information for understanding molecules involved in the prognosis of ICC, and even contributed to the treatment for ICC. However, further studies with selective candidate genes and larger sample size are required to define their role in ICC. Additionally, more studies need to be performed to examine the underlying mechanisms of FOS and N-glycan biosynthesis pathway.
Abbreviations ceRNAs: competing endogenous RNAs; lncRNAs: long non-coding RNAs; ICC: intrahepatic   The four colored ovals represent the prediction results of the TargetScan, miRDB, and miRTarBase databases, and 3158 significantly up-regulated DE mRNAs obtained by differential analysis respectively. The middle part represents the intersection of the four sets of data. C, Construction of ceRNA network of lncRNA-miR-mRNA in ICC. The network consists of 37 lncRNA nodes, 12 miR nodes, and 60 mRNAs. The triangle represents lncRNA, the circle represents miR, and the square represents mRNA. Nodes highlighted in red and blue indicate up and down, respectively.    The biological pathways associated with FOS, IGF2, and IGF1R were predicted in ICC tissues using GSEA. A, The biological pathway enriched in the FOS high expression group. B, The biological pathway enriched in the IGF2 high expression group. C, The biological pathway enriched in the IGF1R high expression group.