The catalogue of somatic mutations
31,958 somatic mutations comprise 30,216 single-nucleotide variants (SNVs) and 1742 small insertions or deletions (indels). The SNVs included 7818 silent, 20,311 missense, 1248 nonsense, 713 splice-site, 119 RNA, 71 translation start site and 24 nonstop mutations, 1259 indels caused reading frame shifts, 404 indels were in frame mutations. Over 67.46% (21,559/31,958) of variants were non-synonymous mutations (Fig. 1a). C>T/G>A, T>C/A>G and T>A/A>T accounted for 54.18, 17.38 and 13.05% of the variant types in the non-CpG sites, 6.90, 1.06 and 0.98% of variant types in the CpG islands respectively. Therefore, C>T/G>A, T>C/A>G and T>A/A>T were the three predominant transitions in glioma (Fig. 1b).
Cancer driver genes and pathways in glioma
Overall, there were 15, 68, 221 and 445 driver genes determined by OncodriveCLUST, OncodriveFM, icages and drgap respectively (Additional file 1: Table S1). Dendrix reported 11,814 genes mutated in at least one patient. We performed Dendrix analyses for sets of size ranging from 2 to 4. When k = 2, the pair IDH1 and PTEN was sampled 95.3% (953/1000) of the times. When k ≥ 3, the triple IDH1, Unknown and PTEN, IDH1, EGFR and PTEN were sampled 69.2% (692/1000) and 30.7% (307/1000) of the times respectively. Therefore, IDH1, PTEN and EGFR showed mutational exclusivity in glioma (Additional file 1: Table S1). EGFR is the only common gene detected by all five tools (Fig. 2a), suggesting that EGFR plays a pivotal role in the tumorigenesis of glioma. Of 685 driver genes, the majority of driver genes are low-frequency mutated genes in glioma, with an average mutation rate of 1.39% (Fig. 2b). IDH1, TP53, ATRX, TTN, PTEN, EGFR and MUC16 were known recurrently mutated genes and showed mutation rates of 40.97, 39.93, 23.78, 22.92, 17.88,16.32 and 15.10% across all samples (Fig. 2c). Besides the list of driver genes, Drgap also reported 215 driver pathways in glioma, such as Hedgehog signaling pathway, mTOR signaling pathway, glioma and acute myeloid leukemia (Additional file 1: Table S2).
The IDH1 gene encoding isocitrate dehydrogenase 1 (IDH1) catalyzes the oxidative carboxylation of isocitrate to a-ketoglutarate, causing production of NADPH in the citric acid (Krebs) cycle [22]. IDH1 is frequently mutated in the majority of low grade gliomas and secondary high grade gliomas. IDH1 mutations change the function of the enzymes, increase DNA methylation and correlate with improved prognosis in glioma [23]. 576 glioma patients were classified into IDH1-mutated (236 patients) and non-IDH1-mutated groups (340 patients) based on the occurrence of IDH1 mutation. Then we applied OncodriveCLUST, OncodriveFM, icages, drgap and Dendrix to detect driver genes and pathways in the two distinct groups. The number of driver genes detected by the five tools was largely different between IDH1-mutated and non-IDH1-mutated groups. There were only 2, 5, 46, 8 and 1 overlapping driver genes predicted by OncodriveCLUST, OncodriveFM, icages, drgap and Dendrix respectively (Additional file 1: Figure S1, Tables S3, S4). Moreover, a set of driver genes were predicted to be IDH1-associated, such as IDH1, NOTCH1, FUBP1, ARID1A, KAT6B and ARID2, while others were IDH1-independent, such as EGFR, PIK3CA, BRAF, RB1 and PTGS2 (Additional file 1: Tables S3, S4, Figure S1). However, the majority of driver pathways (196/209, 93.78%) unraveled by drgap in glioma patients with IDH1 mutations overlapped with those (196/214, 91.59%) in glioma patients without IDH1 mutations. All the results suggest that IDH1 and non-IDH1 mutated glioma subtypes are caused by mutational disruption of different genes but similar pathways.
GO term and KEGG pathway enrichment analyses
The enrichment of GO terms and KEGG pathways was analyzed for 685 driver genes on the home page of STRING. GO enrichment analysis indicated driver genes were significantly enriched in 1750 biological processes (FDR < 0.05). The main GO biological process terms showed a wide spectrum of functional processes ranging from cellular developmental process, cell differentiation, programmed cell death, apoptotic process to regulation of metabolic processes (Additional file 1: Table S5). STRING also revealed 145 KEGG pathways significantly enriched for driver genes, such as glioma, melanoma, thyroid cancer, pancreatic cancer, renal cell carcinoma, bladder cancer, colorectal cancer, non-small cell lung cancer, endometrial cancer, prostate cancer, acute myeloid leukemia, MAPK signaling pathway, mTOR signaling pathway and cell cycle (FDR < 0.05, Additional file 1: Table S6).
Expression profiling in glioma
In order to analyze the driver gene expression profile in glioma, RNA-seq data of 75 glioma and 17 normal brain tissues were obtained from Gill’s study. Overall, we found 428 differentially expressed driver genes, including 330 up-regulated and 98 down-regulated genes (Additional file 1: Figure S2). Next, we built co-expression networks based on the expression correlation of differentially expressed driver genes in cancer tissues and normal brain tissues respectively. The genes which have high degrees possess intensive interactions with other genes and may act as key genes in the co-expression network. As shown Fig. 3, FSTL5, HCN1, TMEM132D, TRHDE, KRT222, CACNA1B, CDH9, SLC22A9, GABRA1 and GABRB2 showed the strongest expression correlation with other genes in glioma tissues (Additional file 1: Table S7), while, PRKAR2B, CCND2, C1orf173, WBSCR17, STXBP5L, PRKCE, KIF3A, GRAMD1B, SLC1A6 and ADCY1 were the hub genes in the co-expression network of normal brain tissues (Fig. 4; Additional file 1: Table S8).
Protein–protein interaction network analysis
In addition to co-expression analysis on driver genes at the mRNA level, we also wanted to know the interactions of driver genes in glioma at the protein level. To this aim, we applied STRING to construct a protein–protein interaction network using 685 driver genes. A high degree protein regulates or is regulated by many other proteins, suggesting an important role in the network of interactions. SRC, ST6GAL2, PIK3CA, PIK3R1, CREBBP, TP53, SOS1, EGFR, EGR1 and EIF1AX are at the core of the protein–protein interaction network (Total STRING score > 29.20, Additional file 1: Table S9, Figure S3). They are responsible for regulation of programmed cell death, protein metabolic process, apoptotic process, EGFR signaling pathway and signaling transduction, suggesting they may play key roles in glioma [24].
Copy number variations in glioma
We also obtained focal CNVs of 52 glioma samples at broad institute. Significant focal gains and deletions (q < 0.25) were found in 29 samples (29/52, 55.77%) at 5 loci (3 amplifications and 2 deletions). Among them, deletions at 9p21.3, 7p11.2 and amplification at 1q32.1 were the most frequent CNVs in glioma, with occurrence rates of 32.69% (17/52), 26.92% (14/52) and 13.46% (7/52) respectively (Additional file 1: Figure S4). 10 driver genes were involved in CNVs, including known tumor suppressors and oncogenes, such as EGFR (amplification, 7p11.2) and MET (amplification, 12q14.1). Many driver genes were also found to be implicated in the CNVs, including PIP4K2C (amplification, 12q14.1), REN (amplification, 1q32.1), PIK3C2B (amplification, 1q32.1), CDKN2B and CDKN2A (deletion, 9p21.3), COL6A3 and NEU2 (deletion, 2q37.3) and HDAC4 (deletion, 2q37.3).
Survival analyses in glioma
We acquired RNAseq and clinical outcome data from TCGA to evaluate whether the expression of driver genes is associated to survival of glioma patients. Overall, Kaplan–Meier survival analyses showed that the expression of 268 driver genes was significantly related to clinical outcomes of glioma patients. The high expression of 162 driver genes was negatively correlated with survival rates of glioma patients, such as SAMD9L, SAMD9, VAV3, FLNA, KDELC2, BRCA2, MAP3K1, BRCA1, LAMA2 and PDGFD (Additional file 1: Table S10). By contrast, 106 driver genes showed positive correlation with patients’ prognosis, such as BMP2, CSMD3, SMOC1, FAM110B, SLC1A6, GABRB3, BAG1, SNAP91, CALN1 and MAPK9 (Additional file 1: Table S10). 133 driver genes were up-regulated and associated to poor prognoses in glioma patients, such as NOTCH2, STAT3, IDH1, ARID1A and MSH6 (Fig. 5a). On the contrary, 43 driver genes were down-regulated expression and related to favorable clinical outcomes in glioma patients, such as PIK3R1, FLT3, PIK3R1 and RUNX1T1 (Fig. 5b). These driver genes might be potential prognostic biomarkers for glioma patients in the future.