- Primary research
- Open Access
Identification of long non-coding RNA signatures in triple-negative breast cancer
Cancer Cell International volume 18, Article number: 103 (2018)
Triple-negative breast cancer (TNBC) is a particular breast cancer subtype with poor prognosis due to its aggressive biological behavior and lack of targets for therapy. This study aimed to explore the expression profile and potential function of lncRNAs in TNBC through bioinformatic methods.
Two microarrays of TNBC were obtained from the Gene Expression Omnibus database. Differentially expressed lncRNAs and mRNAs were screened out and the expressions of top lncRNAs and overlapping lncRNAs were validated using data from The Cancer Genome Atlas database. The co-expression analysis of lncRNAs and mRNAs was conducted using R software and functional enrichment analysis for was performed by Metascape. Kaplan–Meier Plotter was used for survival analysis.
A total of 1034 dysregulated lncRNAs were found in the two microarrays, and there were 8 overlapped lncRNAs. Among them, 537 lncRNAs were significantly correlated with 451 protein-coding genes (PCGs). The co-expressed PCGs were mainly enriched in terms including cell division, cell cycle, and protein/DNA binding, and were involved in pathways in cancer and other pathways such as PI3K-Akt, MAPK, ErbB and p53 signaling pathways. Hub-genes in the co-expression network were identified, and 7 of them were associated with relapse-free survival of TNBC (MAGI2-AS3: HR = 0.51; GGTA1P: HR = 0.54; NAP1L2: HR = 0.59; CRABP2: HR = 0.41; SYNPO2: HR = 0.50; MKI67: HR = 2.23; COL4A6: HR = 1.91; all P < 0.05).
Numerous lncRNAs were dysregulated in TNBC, and many of them are possibly involved in cancer biology. Several of these lncRNAs were associated with of TNBC prognosis, which can be promising biomarkers.
Breast cancer (BC) is the most common type of cancer and the leading cause of cancer death among women all over the world. . Triple-negative breast cancer (TNBC) is a particular subtype of breast cancer, characterized by poor prognosis because of its aggressive biological behavior and lack of molecular targets for therapy . It is defined by the absence of estrogen receptor (ER) and progesterone receptor (PR) expression and without amplification of human epidermal growth factor receptor 2 (HER2) . The treatment methods for TNBC are very limited owing to the lack of decisive therapeutic targets. Hence, it is necessary to explore new targeted approaches and make efforts to improve the outcomes of TNBC.
In recent years, long non-coding RNAs (lncRNAs) have drawn an increased attention because of their functions in the human diseases including cancers. LncRNAs are a class of RNA transcripts with a length of > 200 nucleotides that do not encode proteins. They are involved in diverse biological processes such as cell proliferation, differentiation, chromosome remodeling, epigenetic modulation, transcriptional and posttranscriptional modifications . Studies have revealed that lncRNAs play an important role in cancer biology and the expression level or mutation of specific lncRNA genes are implicated in the development and progression of cancer. Moreover, a large number of lncRNAs are deregulated in multiple tumors including breast cancer, making them possibly as diagnostic and prognostic biomarkers or as potential therapeutic targets for cancer [5, 6]. Several lncRNAs have been reported to regulate TNBC progression. For instance, lncRNA LINP1 is overexpressed and enhances double-strand DNA break repair in TNBC. Blocking LINP1 increases sensitivity of BC cell response to radiotherapy . LINK-A facilitates the activation of BRK kinase, thus activates normoxic HIF1α signaling in TNBC, promoting breast cancer glycolysis reprogramming and tumorigenesis . With the development of RNA sequencing and genomic technologies as well as computational techniques, more and more lncRNAs have been discovered. However, the studies about lncRNAs and TNBC are very few by far, and the expression profile, functions and mechanisms of lncRNAs in TNBC remains to be extensively explored [9, 10]. Thus, we mined and analyzed data from several databases, hoping to highlight signatures of lncRNAs in TNBC and provide foundation for further studies.
Acquisition and analysis of microarray data
Two lncRNA microarray datasets (GSE60689 and GSE64790) of TNBC were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). Raw data were reprocessed by the online tool GEO2R. The annotation of lncRNAs was accessed directly from additional files of the microarrays or query from the LNCipedia database (http://www.lncipedia.org, version 5.0). If a lncRNA has an official gene symbol according to HGNC, that symbol was used as the name of lncRNA. Otherwise, the accession number or gene ID was used. We then screen out significant differentially expressed lncRNAs for further analysis with the criteria of |lgFC| ≥ 2.0 and P value ≤ 0.05. The overlapping lncRNAs were identified through an online tool for Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/). The software Morpheus (https://software.broadinstitute.org/morpheus/) were used to draw heatmap.
Validation of lncRNA genes expression
The breast invasive carcinoma (BRCA) RNAseq dataset from The Cancer Genome Atlas (TCGA) database were used to validate the expression profile of the top 10 lncRNAs and the overlapped lncRNAs in two microarrays. Data were downloaded through the Atlas of ncRNA in cancer (TANRIC) database (http://ibl.mdanderson.org/tanric/_design/basic/index.html) . The lncRNA expression level was quantified using log2RPKM value. And t or t’ test were used to examine the difference between TNBC and normal groups.
Co-expression analysis of lncRNAs with mRNAs
The microarray GSE64790 also investigated the expression profile of mRNAs in TNBC. Therefore, we selected differentially expressed mRNAs with the same criteria for lncRNAs and then conducted co-expression analysis for the differentially expressed lncRNAs and mRNAs using R software. Pearson’s correlation coefficients between the lncRNA genes and mRNA genes were calculated using the expression matrix. P < 0.01 was the cut-off value to define significant correlations. The co-expression network was constructed by the software Cytoscape (version 3.5.1), and hub-genes in the network were selected according to their rank by degree .
Functional enrichment analysis for DEGs
We mined BC-associated genes reported by literature through PALM-IST (http://www.hpppi.iicb.res.in/ctm/index.html). The overlapping genes in BC-associated gene set and co-expressed differentially expressed gene (DEG) set were screened out for functional enrichment analysis, performing by Metascape (http://metascape.org) . Gene Ontology (GO) terms for the biological process (BP), cellular component (CC) and molecular function (MF) categories as well as Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were enriched. Only terms with P value < 0.01 and the number of enriched genes ≥ 3 were concerned as significant. All the resultant terms were then grouped into clusters based on their similarities. The most enriched term within a cluster was chosen as the one to represent the cluster.
DEG-based survival analysis
The online survival analysis tool Kaplan–Meier Plotter (http://kmplot.com/analysis/) was used to assess the prognostic value of these significant DEGs in our analysis in TNBC. Patients were divided into high and low expression groups according to the median expression level of the corresponding gene. The log-rank test was used to examine the significance of difference between two groups and HR was calculated to evaluate the association of gene expression with survival .
Dysregulated lncRNAs in TNBC
Through analysis of microarray data, 432 up-regulated lncRNAs and 602 down-regulated lncRNAs were identified according to our criteria (Additional file 1: Table S1). There are 8 overlapped lncRNAs (2 up-regulated and 6 down-regulated) in both microarrays (Fig. 1 and Table 1). The 50 top differentially expressed lncRNAs in the two microarrays were shown in Fig. 2. Of the top 10 lncRNAs in the two microarrays, 8 were found in TANRIC database. Their expressions were validated in 119 TNBC samples and 105 normal controls with data from BRCA RNAseq dataset of TCGA. As shown in Figs. 3, 4 lncRNAs were increased while the other 4 lncRNAs were decreased in TNBC compared with normal tissues (P < 0.01). The expressions of all these lncRNAs were consistent with the results of microarrays except two (RP11-356O9.1 and RP11-369C8.1).
Co-expression network and hub-genes
The co-expression analysis showed that there are 537 nonco-lncRNAs and 451 protein-coding genes (PCGs) whose expressions are significantly correlated. A co-expression network of these DEGs was constructed based on their correlation coefficients. The network is very large comprising 1259 nodes and approximately 40 thousand edges, including 25,203 positive connections and 14,449 negative connections (Fig. 4a). The 50 top hub-genes were selected out and visualized (Fig. 4b). Among these hub-genes, 17 were lncRNA genes and 33 were PCGs.
Functional characterization of DEGs
In total, 1037 terms were enriched, including 826 BP terms, 62 CC terms, 89 MF terms and 60 KEGG pathways (Additional file 2: Table S2). The DEGs mainly involve in biological process such as cell division, chromosome segregation, and cell cycle, and have molecular function such as protein and DNA binding, protein kinase activity, receptor ligand or regulator activity. Most of enriched pathways are cancer-related. A total of 25 DEGs are involved in pathways in cancer, which is the most enriched one. Other pathways include focal adhesion, breast cancer, cell cycle as well as PI3K-Akt, MAPK, ErbB and p53 signaling pathway, etc. The top 20 clusters of significantly enriched terms are shown in Fig. 5.
Prognostic value of hub-genes in TNBC
In order to explore the prognostic values of the significantly dysregulated lncRNAs, we analyzed the associations of the expression level of 50 top hub-genes in co-expression network with the survival of TNBC patients. Seven DEGs (2 lncRNA genes and 5 PCGs) were found to associate with relapse-free survival (RFS) of TNBC (P < 0.05, Fig. 6). Patients with elevated level of MAGI2-AS3 and GGTA1P tend to have a better relapse-free survival (HR = 0.51 and 0.54 respectively). In addition, high expression of NAP1L2, CRABP2 and SYNPO2 is beneficial for RFS (HR = 0.59, 0.41 and 0.50 respectively) of TNBC patients, whereas increased level of MKI67 and COL4A6 is a risk factor for RFS (HR = 2.23 and 1.91 respectively).
Among all the breast cancer subtypes, TNBCs account for approximately 15% to 20% of all diagnosed breast cancer cases and are more prevalent in younger women (age < 40 years) . TNBC is a complex and heterogeneous disease and the outcomes of patients are relatively worse than those of other subtypes. Only 30–45% of TNBC patients can achieve a pathological complete response (pCR) and survival rates similar to other types of breast cancer . The poor prognosis of TNBC is mainly due to the lack of effective targets for treatment. Therefore, it is crucial to find new therapeutic targets for the improvement of TNBC prognosis.
LncRNAs play an important role in carcinogenesis. Many lncRNAs are dysregulated in tumors, and they are promising diagnostic biomarkers and potential therapeutic targets for cancers [17,18,19]. In this study, we identified a number of TNBC-associated lncRNA genes through bioinformatic methods. Most of them are novel lncRNAs, many of which even do not have an official name. All of the overlapped lncRNAs in both microarrays have not been studied in BC except MEG3. So, they are good targets for future research. There are also some lncRNAs which have been extensively studied previously. For example, the most dysregulated lncRNA gene, BCAR4, has been found to be overexpressed in breast tumor tissue in previous studies and was associated with poor survival of breast cancer patients [20, 21]. Furthermore, it has been proved that BCAR4 can promote breast cancer cell migration and invasion through noncanonical hedgehog signaling pathway . MEG3 is a tumor suppressor lncRNA gene, its expression is decreased in multiple tumors including lung cancer, gastric cancer, hepatocellular carcinoma, glioma etc. . In breast cancer, it can inhibit cell proliferation, invasion and angiogenesis by sponging microRNAs and regulating signaling transduction such as AKT and TGF-β pathway [23, 24]. H19 is also one of the major lncRNA genes in cancer, but it has long been a controversy whether it is oncogenic or tumor-suppressive. H19 plays a role in tumor initiation and progression, the mechanisms, however, vary among cancer types [25, 26]. In breast cancer, H19 involves in tumor growth and metastasis through interaction with protein and microRNAs . The mechanisms of lncRNA regulation in TNBC have not been clarified by far. Previous studies have shown that they can be regulated by some important signaling pathways. For example, LINP1 expression is activated by the EGF signaling and repressed by the p53 pathway in TNBC . The expression level of lncRNAs can also be altered by epigenetic modification. For example, the promoter-associated CpG island of LOC554202 was hypermethylated, thus leading to the down-regulation of LOC554202 in TNBC cells . In addition, lncRNA’s expression can be regulated by its biodegradation rate or transcription rate .
There were some similar studies published previously [30,31,32]. But these studies only mined data from one microarray without making in-depth analysis. Our study comprehensively analyzed two datasets and made further analysis. We also identified numerous abnormally expressed PCGs in TNBC. And by establishing gene co-expression network, we found the PCGs whose expression profiles are correlated with that of lncRNA genes. Many of these PCGs were enriched in biological processes and pathways which are important for tumorigenesis and cancer progress. The 50 top genes ranked by degree in the network were selected as hub-genes, among which, low expression of three genes (NAP1L2, CRABP2 and SYNPO2) while high expression of two genes (MKI67 and COL4A6) was associated with poor RFS of TNBC patients. The products of hub PCGs mainly function as protein binding molecule and were involved in important biological processes and signaling pathways in cancer (CDK1, MKI67, CENPF, COL4A6, DACH1, etc.). As they are highly correlated with lncRNA genes, they may be the targets through which the TNBC-associated lncRNAs can influence the onset and progress of TNBC.
In summary, numerous lncRNA were dysregulated in TNBC and many of them are possibly involved in cancer development. The specific function of these lncRNAs needs further exploration. Nevertheless, our study illuminates a comprehensive understanding of lncRNA signatures in TNBC and suggests its important role. These dysregulated lncRNAs can be promising biomarkers for diagnosis or prognosis and may be potential targets for therapy. We hope these findings can draw more attention to lncRNAs in TNBC research and provide orientations for future studies.
triple-negative breast cancer
long non-coding RNA
differentially expressed gene
Gene Expression Omnibus
The Cancer Genome Atlas
human epidermal growth factor receptor 2
Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108.
Bianchini G, Balko JM, Mayer IA, Sanders ME, Gianni L. Triple-negative breast cancer: challenges and opportunities of a heterogeneous disease. Nat Rev Clin Oncol. 2016;13(11):674–90.
Bosch A, Eroles P, Zaragoza R, Vina JR, Lluch A. Triple-negative breast cancer: molecular features, pathogenesis, treatment and current lines of research. Cancer Treat Rev. 2010;36(3):206–15.
Quinn JJ, Chang HY. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet. 2016;17(1):47–62.
Zhang R, Xia LQ, Lu WW, Zhang J, Zhu JS. LncRNAs and cancer. Oncol Lett. 2016;12(2):1233–9.
Malih S, Saidijam M, Malih N. A brief review on long noncoding RNAs: a new paradigm in breast cancer pathogenesis, diagnosis and therapy. Tumour Biol. 2016;37(2):1479–85.
Zhang Y, He Q, Hu Z, Feng Y, Fan L, Tang Z, Yuan J, Shan W, Li C, Hu X, et al. Long noncoding RNA LINP1 regulates repair of DNA double-strand breaks in triple-negative breast cancer. Nat Struct Mol Biol. 2016;23(6):522–30.
Lin A, Li C, Xing Z, Hu Q, Liang K, Han L, Wang C, Hawke DH. The LINK-A lncRNA activates normoxic HIF1alpha signalling in triple-negative breast cancer. Nat Cell Biol. 2016;18(2):213–24.
Kapusta A, Feschotte C. Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends Genet. 2014;30(10):439–52.
Wang J, Ye C, Xiong H, Shen Y, Lu Y, Zhou J, Wang L. Dysregulation of long non-coding RNA in breast cancer: an overview of mechanism and clinical implication. Oncotarget. 2017;8(3):5508–22.
Li J, Han L, Roebuck P, Diao L, Liu L, Yuan Y, Weinstein JN, Liang H. TANRIC: an interactive open platform to explore the function of lncRNAs in cancer. Can Res. 2015;75(18):3728–37.
Chin CH, Chen SH, Wu HH, Ho CW, Ko MT, Lin CY. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(Suppl 4):S11.
Tripathi S, Pohl MO, Zhou Y, Rodriguez-Frandsen A, Wang G, Stein DA, Moulton HM, DeJesus P, Che J, Mulder LC, et al. Meta- and orthogonal integration of influenza “OMICs” data defines a role for UBR4 in virus budding. Cell Host Microbe. 2015;18(6):723–35.
Gyorffy B, Lanczky A, Eklund AC, Denkert C, Budczies J, Li Q, Szallasi Z. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Res Treat. 2010;123(3):725–31.
Foulkes WD, Smith IE, Reis-Filho JS. Triple-negative breast cancer. N Engl J Med. 2010;363(20):1938–48.
Kalimutho M, Parsons K, Mittal D, Lopez JA, Srihari S, Khanna KK. Targeted therapies for triple-negative breast cancer: combating a stubborn disease. Trends Pharmacol Sci. 2015;36(12):822–46.
Batista PJ, Chang HY. Long noncoding RNAs: cellular address codes in development and disease. Cell. 2013;152(6):1298–307.
Beermann J, Piccoli MT, Viereck J, Thum T. Non-coding RNAs in development and disease: background, mechanisms, and therapeutic approaches. Physiol Rev. 2016;96(4):1297–325.
Prensner JR, Chinnaiyan AM. The emergence of lncRNAs in cancer biology. Cancer Discov. 2011;1(5):391–407.
Godinho MF, Sieuwerts AM, Look MP, Meijer D, Foekens JA, Dorssers LC, van Agthoven T. Relevance of BCAR4 in tamoxifen resistance and tumour aggressiveness of human breast cancer. Br J Cancer. 2010;103(8):1284–91.
Xing Z, Lin A, Li C, Liang K, Wang S, Liu Y, Park PK, Qin L, Wei Y, Hawke DH, et al. lncRNA directs cooperative epigenetic regulation downstream of chemokine signals. Cell. 2014;159(5):1110–25.
He Y, Luo Y, Liang B, Ye L, Lu G, He W. Potential applications of MEG3 in cancer diagnosis and prognosis. Oncotarget. 2017;8(42):73282–95.
Mondal T, Subhash S, Vaid R, Enroth S, Uday S, Reinius B, Mitra S, Mohammed A, James AR, Hoberg E, et al. MEG3 long noncoding RNA regulates the TGF-beta pathway genes through formation of RNA-DNA triplex structures. Nat Commun. 2015;6:7743.
Zhang W, Shi S, Jiang J, Li X, Lu H, Ren F. LncRNA MEG3 inhibits cell epithelial-mesenchymal transition by sponging miR-421 targeting E-cadherin in breast cancer. Biomed Pharmacother. 2017;91:312–9.
Raveh E, Matouk IJ, Gilon M, Hochberg A. The H19 Long non-coding RNA in cancer initiation, progression and metastasis—a proposed unifying theory. Mol Cancer. 2015;14:184.
Yoshimura H, Matsuda Y, Yamamoto M, Kamiya S, Ishiwata T. Expression and role of long non-coding RNA H19 in carcinogenesis. Front Biosci (Landmark edition). 2018;23:614–25.
Collette J, Le Bourhis X, Adriaenssens E. Regulation of human breast cancer by the long non-coding RNA H19. Int J Mol Sci. 2017;18:11.
Augoff K, McCue B, Plow EF, Sossey-Alaoui K. miR-31 and its host gene lncRNA LOC554202 are regulated by promoter hypermethylation in triple-negative breast cancer. Mol Cancer. 2012;11:5.
Kong X, Liu W, Kong Y. Roles and expression profiles of long non-coding RNAs in triple-negative breast cancers. J Cell Mol Med. 2018;22(1):390–4.
Shen X, Xie B, Ma Z, Yu W, Wang W, Xu D, Yan X, Chen B, Yu L, Li J, et al. Identification of novel long non-coding RNAs in triple-negative breast cancer. Oncotarget. 2015;6(25):21730–9.
Chen C, Li Z, Yang Y, Xiang T, Song W, Liu S. Microarray expression profiling of dysregulated long non-coding RNAs in triple-negative breast cancer. Cancer Biol Ther. 2015;16(6):856–65.
Wang L, Shen X, Xie B, Ma Z, Chen X, Cao F. Transcriptional profiling of differentially expressed long non-coding RNAs in breast cancer. Genom Data. 2015;6:214–6.
TT and ZJD conceived and designed the study. ZQG, MW, SL, KL, PTY, NL, and YW collected and processed data. TT, ZQ G and RH H analyzed data. MW, YZ, PX, YJD and DL S prepared tables and figures. TT drafted the manuscript. ZJD and GF revised the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
All data generated and analyzed during this study are included in this published article and its additional files.
Consent for publication
Ethics approval and consent to participate
This work was supported by National Natural Science Foundation, People’s Republic of China (No. 81471670), and the Key research and development plan, Shaanxi Province, People’s Republic of China (2017ZDXM-SF-066).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.