Identi � cation of potential biomarkers and candidate small molecule drugs in glioblastoma

Background and aims Glioblastoma (GBM) is a common and aggressive primary brain tumor, and the prognosis for GBM patients remains poor. This study aimed to identify the key genes associated with the development of GBM and provide new diagnostic and therapies for GBM. Methods Three microarray datasets (GSE111260, GSE103227, and GSE104267) were selected from Gene Expression Omnibus (GEO) database for integrated analysis. The differential expressed genes (DEGs) between GBM and normal tissues were identified. Then, prognosis-related DEGs were screened by survival analysis, followed by functional enrichment analysis. The protein–protein interaction (PPI) network was constructed to explore the hub genes associated with GBM. The mRNA and protein expression levels of hub genes were respectively validated in silico using The Cancer Genome Atlas (TCGA) and Human Protein Atlas (HPA) databases. Subsequently, the small molecule drugs of GBM were predicted by using Connectivity Map (CMAP) database. Results A total of 78 prognosis-related DEGs were identified, of which10 hub genes with higher degree were obtained by PPI analysis. The mRNA expression and protein expression levels of CETN2, MKI67, ARL13B, and SETDB1 were overexpressed in GBM tissues, while the expression levels of CALN1, ELAVL3, ADCY3, SYN2, SLC12A5, and SOD1 were down-regulated in GBM tissues. Additionally, these genes were significantly associated with the prognosis of GBM. We eventually predicted the 10 most vital small molecule drugs, which potentially imitate or reverse GBM carcinogenic status. Cycloserine and 11-deoxy-16,16-dimethylprostaglandin E2 might be considered as potential therapeutic drugs of GBM. Conclusions Our study provided 10 key genes for diagnosis, prognosis, and therapy for GBM. These findings might contribute to a better comprehension of molecular mechanisms of GBM development, and provide new perspective for further GBM research. However, specific regulatory mechanism of these genes needed further elaboration.

be potential targets for GBM treatment [4,5]. Additionally, drugs like triple-drug therapy (bevacizumab, irinotecan, and temozolomide) had bene t effect on recurrent GBM [6]. However, different studies often yield diverse results and the molecular mechanism of GBM pathogenesis has not been entirely elucidated. Thus, it is desperately required to explore novel biomarkers and small drug molecules.
Currently, the microarray gene expression research has been performed to uncover the molecular mechanism of various cancers. The mRNA data are collected from two databases, including Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA). The GEO database can be applied to identify the differentially expressed genes (DEGs), explore molecular signal and its correlation, and analyze gene regulation network [7]. However, due to the limited samples, the analysis results of a single microarray dataset may be biased and unreliable. Hence, integrated analysis of multiple datasets can improve the accuracy and reliability of the results, thus obtain a comprehensive discovery of DEGs in tumors.
In the present study, three microarray datasets related GBM were selected for further study, the raw data of mRNA pro le were downloaded from GEO database and integrated bioinformatics analysis of three data sets was conducted. The overlapping DEGs were identi ed by the intersection of three datasets.
Then, the DEGs associated with GBM prognosis were screened using TCGA database. Functional enrichment analysis was performed to understand the biological functions of these DEGs. We also established a protein-protein interaction (PPI) network to screen hub genes. Thereafter, the mRNA and protein expression level of hub genes were respectively veri ed by using UALCAN online tool and Human Protein Atlas (HPA) database. Finally, the small molecule drugs of GBM were explored by connectivity map (CMAP) database. The ow chart of this study protocol is shown in Figure 1.

Methods And Materials
Microarray data To screen the DEGs between GBM samples and normal samples, three microarray datasets (GSE111260, GSE103227, and GSE104267) were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo) [8]. Arraystar human lncRNA microarray V3 (Probe Name Version) (GSE103227), and GPL22448 Phalanx Human lncRNA OneArray v1_mRNA platform (GSE104267), respectively. In total, 81 GBM samples and 11 normal samples were included in our analysis, of which 67 GBM samples and three control samples in GSE111260, ve GBM and ve normal tissues in GSE103227, and nine GBM and three control samples in GSE104267.
Data pre-processing The preprocessing of raw data was performed using the limma package of the R software (Version 3.34.9, http://www.bioconductor.org/packages/release/bioc/html/limma.html) [9], and then data normalization of samples from each expression pro les was conducted by using robust multi-array average (RMA) method [10,11], including background adjustment, quantile normalization, and log2 conversion. Afterwards, the probes were annotated with the platform annotation le, the probes that did not matched the gene (gene symbol) were removed; in addition, for the multiple probes that mapped to the same gene, their average values were calculated as the nal expression value.
DEGs screening and Meta-analysis DEGs between GBM and control sample in the three datasets were respectively screened by using the Limma package. P < 0.05 and |log Fold change (FC)| >1 were considered as the criteria for DEGs.
The intersection of DEGs from three microarray datasets was conducted by NetworkAnalyst 3.0 database [12] (https://www.networkanalyst.ca/NetworkAnalyst/uploads/MetaLoadView.xhtml), which could compare and analyze of DEGs generated from different studies via various statistical methods. In this study, three statistical methods, including Fisher's method, Fixed effects models, and Vote Counting, were applied to integrate multiple data sets. DEGs with P < 0.001 (both Fisher's method and Fixed effects models) and vote counts ≥ 2 were considered as shared DEGs. Meanwhile, the ComBat function of the R package sva [12] was utilized to eliminate heterogeneity between these three datasets. Additionally, the DEGs screened by the three integration methods were also analyzed by VENN, and the genes that existed in at least two methods were selected as the focus of the further analysis.

VENN analysis
The DEGs obtained from each dataset were analyzed by VENN analysis to observe the up-or downregulated of the genes. Additionally, the DEGs screened by the three integration methods were also analyzed by VENN, and the DEGs that existed in at least two methods were selected as the focus of the further analysis.

Survival analysis
Both the mRNA-seq data and clinical information of GBM patients were acquired from TCGA genomic data commons (GDC) (https://xenabrowser.net/) portal [13]. According to the shared DEGs identi ed from the integrated analysis, the samples with no overall survival (OS) time (or less than one mouth) and the DEGs with median expression level less than 0 were removed. Afterwards, the remaining samples were divided into high expression group and low expression group according to the median expression levels of genes. Survival analysis was performed using Kaplan-Meier and the log-rank statistical test. P < 0.05 was regarded as statistically signi cant threshold. Functional enrichment analysis of prognostic related DEGs To investigate the biological functions and pathways involved in these prognostic related DEGs, the Gene Ontology (GO) terms and pathway analysis were performed by using metascape database (http://metascape.org) [14]. P < 0.01 and count > 3 were set as the threshold of signi cantly enriched terms. In order to further explore the relationship between the statistically enriched terms, the kappa-statistical similarities of these terms were calculated, and the overlapping or related terms were identi ed to perform functional network clustering. According to the gene similarity enriched in each term (similarity of >0.3), the interaction relationship of the term was obtained. Subsequently, the functional enrichment network was constructed.

The PPI network construction
The prognostic related DEGs were mapped in Search Tool for the Retrieval of Interacting Genes (STRING, version: 11.0, https://string-db.org) database [15] to recognize their potential interaction relationships from protein level. The species was Homo sapiens and the con dent interaction score more than 0.15 (low con dent) was set as signi cant interaction. The PPI network was visualized using Cytoscape software (version: 3.6.1, http://www.cytoscape.org/) [16]. In addition, the degree of each protein node was calculated and nodes with degree ≥ 10 were selected as hub genes.

Veri cation of hub genes
We used the online software UALCAN (http://ualcan.path.uab.edu/index.html) [17] to verify the hub genes identi ed from the PPI network. The candidate hub genes were submitted to the UALCAN database and the TCGA data were applied to validate the relationship between the genes expression and the prognosis of GBM.

Gene mutation analysis
The cBio Cancer Genomics Portal could analyze the molecular data obtained from cancer tissues and cytology, to recognize and understand heredity, epigenetics, and gene expression. Thus, we used the CBiocancer genomics portal (https://www.cbioportal.org/) [18] to analyze the genetic mutations of the key genes between samples.
The HPAs, composed of tissue atlas, cell atlas, and pathology atlas, were provided the data of transcriptomics and proteomics in speci c human tissues. In this study, the protein level of hub genes in GBM tissues and compared normal tissues was investigated by using HPA database [19].

Identi cation of candidate small molecule drugs for GBM
The CMAP database, composed of 7056 gene expression pro les induced by 1309 small molecules, is widely applied to explore the potential unknown roles of existing drugs on diseases [20]. First, the prognosis related genes were classi ed into up-regulated and down-regulated groups. Then, these genes from two groups were uploaded into CMAP database to obtain the potential small drug molecules, and P < 0.05 was regarded as the cut-off criteria. Finally, the enrichment scores (-1 to +1) that could assess the similarity between genes and drugs were calculated. Speci cally, enrichment score > 0 indicated the molecules had potential synergistic effects to GBM, suggesting they were able to imitate the biological status of GBM cell; while enrichment score < 0 revealed molecules had potential antagonistic effects, indicating they could reverse the GBM carcinogenic status and could serves as therapeutic drugs.

Identi cation of DEGs from GEO datasets analysis
The raw data from three gene expression pro les (GSE103227, GSE104267, and GSE111260) were downloaded from NCBI GEO database. The detailed information of these three datasets is listed in Table  1. DEGs between GBM samples and normal samples were screened from three studies, and then visualized by volcano plots and Principal Components Analysis (PCA) (Figure 2A and 2B). Afterwards, the number of DEGs obtained from three datasets is shown in Supplementary Table 1. Furthermore, the Venn diagrams showed 24 overlapping DEGs among the three datasets, including 18 up-regulated genes ( Figure 2Ca) and 6 down-regulated genes ( Figure 2Cb). Meta-analysis of three GEO datasets By employing three statistical methods, a total of 5801, 640, and 2368 DEGs were identi ed by Fisher's method, Fixed effects models, and Vote counting, respectively. Additionally, 613 shared genes were obtained by all three statistical methods and 2357 DEGs were existed in at least two methods ( Figure  2D).

Survival analysis of DEGs
In order to clarify the relationship between gene expression and GBM prognosis, we used K-M and log rank test for survival analysis. The clinical data of 167 patients with GBM was downloaded from TCGA, and the overall survival analysis of 2357 DEGs was performed. Finally, we obtained 78 DEGs were signi cantly connected with the prognosis of GBM (Supplementary Table 2). GO enrichment and KEGG pathway analysis of prognosis related genes According to the result mentioned above, the functional enrichment analysis of 78 prognosis-related genes was conducted. Three categories of GO enrichment analysis were performed, including biological process (BP), cellular component (CC), and molecular function (MF). The results indicated that these genes were mainly associated with GO_BP terms such as behavior, sensory organ morphogenesis, and chromosome separation. As for GO_CC terms, genes were primarily enriched in histone deacetylase complex, neuron to neuron synapse, transferase complex, and axon. For MFs, DEGs were particularly related to DNA-binding transcription repressor activity, RNA polymerase II-speci c, actin lament binding, and chromatin binding. Additionally, the KEGG pathway analysis revealed that these genes were signi cantly involved in longevity regulating pathway, bile secretion, insulin secretion, and thyroid hormone signaling pathway (Figure 3A and Supplementary Table 3). Furthermore, all terms were grouped into clusters based on the similarities, and a total of 13 clusters of signi cantly enriched terms were obtained ( Figure 3B), among these, sensory organ morphogenesis was the most enriched term.
The mRNA level and mutation state of hub genes By analyzing the expression of the hub genes in the TCGA GBM data, we observed that the expression levels of 10 hub genes were consistent with the results of microarray datasets, including MKI67, ARL13B, SETDB1, ELAVL3, ADCY3, SOD1, CALN1, SYN2, and SLC12A5. Notably, compared with normal samples, the expression level of MKI67, ARL13B, and SETDB1 was signi cantly up-regulated in GBM samples, while ELAVL3, ADCY3, SOD1, CALN1, SYN2, and SLC12A5 were markedly down-regulated ( Figure 5 and Table 3). In addition, we also display the K-M curves of hub genes in Supplementary Figure 1. Results showed that CETN2, MKI67, ARL13B, and SETDB1 with lower expression level were related to a signi cantly longer survival time; meanwhile, high expression of CALN1, ELAVL3, ADCY3, SYN2, ARL13B, SLC12A5, and SOD1 were associated with better overall survival among patients with GBM. The results of prognosis were consistent with the mRNA expression levels of hub genes.
Furthermore, the hub gene mutations in GBM were tested using cBioPortal. The MKI67, SLC12A5, and SOD1 exhibited higher mutation frequencies, and the proportion of them was 2.2, 0.7, and 0.2%, respectively (Supplementary Figure 2A). Meanwhile, approximately 3% of GBM clinical cases showed signi cant alterations in the 10 hub genes (Supplementary Figure 2B).

Immunohistochemical Analysis
Apart from investigating the mRNA level of hub genes, the protein expression levels were also explored using the HPA database. Because the immunohistochemical information of SYN2 was not existed in HPA, we have displayed nine pairs of staining results in Figure 6. The protein level of MKI67 and ARL13B was undetected in normal tissues, while the level of these genes was medium and high in the GBM tissues, respectively. The protein level of CETN2 was low in normal samples, while the level of it was high in GBM samples. Additionally, the medium protein level of SETDB1 was observed in normal tissues, whereas the high protein level was revealed in GBM tissues. Meanwhile, the protein level of CALN1 was medium in normal samples, while was low in the GBM samples. SOD1 moderately expressed in normal tissues but undetectable in GBM tissues, and ELAVL3 lowly expressed in normal tissues but undetectable in GBM tissues. ADCY3 was low expressed in normal and GBM tissues, whereas SLC12A5 was undetectable in normal and GBM samples. Thus, CETN2, MKI67, ARL13B, SETDB1, and CALN1 might be potential biomarkers for screening high-risk patients with GBM.

Analysis of GBM-related small molecular drugs
To identify candidate small molecular drugs targeting the gene expression of GBM, all the prognosisrelated DEGs were divided into up-regulated and down-regulated groups, which were submitted to the CMAP database. A total of 98 small molecular drugs that closely related to the biological status of GBM were obtained, of which 45 drugs might play potential synergies role in the development of GBM (enrichment score> 0), while 53 drugs might serve repress role in the GBM progression (enrichment score< 0). The top 10 vital small molecule drugs were selected (Figure 7). Among these drugs, cycloserine (enrichment score =-0.844) and 11-deoxy-16,16-dimethylprostaglandin E2 (enrichment score =-0.835) showed highly signi cant negative correlation and had potential to reverse the carcinoma status of GBM.
These identi ed small molecule drugs with enrichment scores <0 could reverse the abnormal gene expression and serve as potential drugs for GBM treatment.

Discussion
Although signi cant breakthrough in GBM treatment programs, including surgery, molecular therapy, and drug treatment, the prognosis for GBM patients remains poor and unchanged over the last 30 years [21]. Therefore, revealing the etiology and molecular mechanism of GBM might play important role in the diagnosis and treatment of tumor. In this study, bioinformatics analysis was used to screen the potential hub genes associated with GBM. By integration analysis of three GEO datasets of GBM, 613 overlapping DEGs were identi ed, among these, 78 DEGs were signi cantly associated with the OS of GBM. The GO analysis showed that these DEGs was mainly enriched in trans-synaptic signaling; and the KEGG pathways enrichment analysis indicated that DEGs were signi cantly involved in longevity regulating pathway. PPI analysis revealed that CETN2, MKI67, ARL13B, SETDB1, CALN1, ELAVL3, ADCY3, SYN2, SLC12A5, and SOD1 with high degree of connectivity were selected as hub genes. For CETN2, MKI67, ARL13B, and SETDB1, patients with high expression experienced a worse OS, while high expression of CALN1, ELAVL3, ADCY3, SYN2, ARL13B, SLC12A5, and SOD1 were associated with better overall survival among patients with GBM. To validate the results of bioinformatics analysis, we evaluated the mRNA and protein expression levels of hub genes by using TCGA and HPA databases. The results showed the same gene expression trend as observed in the GEO database, which further con rmed the accuracy of our ndings. Specially, CETN2, MKI67, ARL13B, SETDB1, and CALN1 might be potential biomarkers for screening high-risk patients with GBM. Furthermore, the small molecular drugs analysis showed that cycloserine and 11-deoxy-16,16-dimethylprostaglandin E2 might as potential therapeutic drugs for GBM.
In this study, GO analysis revealed that trans-synaptic signaling was the signi cantly enriched term for DEGs, which was consistent with previous study [22]. During the process of synaptogenesis, the glycans could modulate trans-synaptic signaling [23]; interestingly, glycans served important role in cancer progression and treatment. Glycosylation resulted in a variety of functional changes in glycoproteins, including adhesion molecules and cell surface receptors, such as e-cadherin and integrin. Notably, these changes conferred distinctive phenotypic characteristics connected with cancer cells [24]. Bassoy et al. observed that the sensitivity of glioma cells to cytotoxic lymphocytes might increase with the decrease of glycan surface expression [25]. Besides, metabotropic glutamate receptors, which involved in synaptic signaling, also participated in the transformation of multiple cancer types, such as GBM, breast cancer, and melanoma skin cancer [26]. These ndings suggested that trans-synaptic signaling might play a vital role in the pathogenesis of GBM.
PPI analysis showed that CETN2 was a hub gene, as well as the mRNA and protein expression levels of it were over-expressed in the GBM tissues. CETN2 was a member of the calcium-binding protein family, and caltractin played a fundamental role in structure and function of the microtubule-organizing center [27]. In addition, CETN2 are also involved in nucleotide excision repair that was linked with the risk of cancer [28]. Accumulating evidences demonstrated that CETN2 was identi ed in various types of cancers. Huan et al. revealed that CETN2 was associated with invasive ductal carcinoma of the breast, and might be potential biomarker for breast cancer [29]. It was reported that the down-regulated of CETN2 might have tumor suppressive function in bladder cancer [30]. Similarly, we found that the low expression level of CETN2 was signi cantly related to better survival of GBM patients. However, no studies have reported the potential mechanism of CETN2 in the initiation and progression of GBM. Hence, the mechanism of how CETN2 contributed to the GBM still need further research.
Meanwhile, we also found MKI67 was closely related to the prognosis of GBM. The protein encoded by MKI67 was necessary for cellular proliferation. Hou et al. showed that the down-regulated of MKI67 could suppress cell growth in the hepatocellular carcinoma cell [31]. Laible et al. indicated that MKI67 was the biomarker of breast cancer [32]. Meanwhile, MKI67 was corrected with nuclear features and associated with the survival of GBM [33,34]. In this study, MKI67 was up-regulated in GBM tissues, and GBM patients with a low MKI67 expression level displayed longer survival. Györffy et al. showed MKI67 was a prognostic factor in breast carcinoma [35]. Taken together, we speculated that MKI67 played vital roles in GBM progression and might serve as a molecular target for GBM treatment.
ARL13B was also involved in the GBM development, and the protein level of ARL13B was higher in tumor samples than normal samples. Casalou et al. con rmed that breast cancer was promoted by ARL13B, which was connected with cancer cell migration and invasion [36]. Another gene, SETDB1 regulated histone methylation, gene silencing, and transcriptional repression [37]. SETDB1 mediated Akt methylation promoted its k63-linked ubiquitination and activation, leading to tumorigenesis [38], and the oncogenic role of SETDB1 had been reported in GBM [39], which was supported our ndings. In addition, CALN1 was another hub gene in PPI analysis. CALN1 encoded a protein with high similarity to the calcium-binding proteins of the calmodulin family. CALN1 might in uence the invasion and migration of osteosarcoma cell line, meanwhile, it was also associated with the survival of osteosarcoma [40]. We found the high expression level of CALN1 was related to the poor prognosis of GBM. ELAVL3 was one of neuronal-speci c RNA-binding proteins (Hu antigens), which was recognized by anti-hu antibody in the serum of patients with paraneoplastic encephalomyelitis and sensory neuropathy [41]. Cauderay revealed that the expression of ELAVL3 was increased in the GBM tissues [42]. Unfortunately, we found ELAVL3 was down-regulated in the tumor samples, this discrepancy required further study. ADCY3 could catalyze the formation of cyclic adenosine monophosphate [43]. Hong et al. indicated that ADCY3 was overexpressed in the gastric cancer tissues, and promoted cell proliferation, migration, and invasion [44].
In this study, we found the high expression level of ADCY3 with worse overall survival. Additionally, we observed SLC12A5 was closely involved in the development of GBM. Verhaak et al. found that SLC12A5 was a common biomarker of GBM [45], which was consistent with our results. Meanwhile, we also observed that SOD1 was closely relevant to the prognosis of GBM. The protein encoded by SOD1 bound copper and zinc ions, and SOD1 was responsible for destroying superoxide free radicals in the body. Kato et al. demonstrated that the expression level of SOD1 was signi cantly changed in the GBM [46]. Gao et al. indicated that GBM with low expression level of SOD1 had better response to radiotherapy [47]. In the present study, patients with low expression of SOD1 experienced a better prognosis. Taken together, these genes served vital role in the development of GBM. However, our study was performed based on the bioinformatics analysis, further experimental studies must be conducted to understand the potential effect of key genes in the GBM pathogenesis.
Based on the small molecular drugs analysis, we determined a set of small molecule drugs that had potential to reverse the abnormal gene expression changes of GBM. Among these, cycloserine and 11deoxy-16,16-dimethylprostaglandin E2 showed a highly signi cant negative correlation and might serve as potential drugs for GBM treatment. Cycloserine is a cyclic analog to D-alanine, which can target alanine racemase and d-alanine ligase, thereby preventing the formation of bacterial cell walls [48]. Recently, cycloserine has been widely used in tuberculosis treatment, but no research has focused on the potential role of this small molecule in GBM. In addition, previous study revealed that 11-deoxy-16,16-dimethylprostaglandin E2 (DDM-PGE2) protected proximal renal tubular epithelial cells from cell damage induced by potent nephrotoxicity by exerting anti-oxidative stress [49]. Meanwhile, it also protected protects against oncotic cell death which induced by H(2)O(2) and iodoacetamide [50]. Similarly, the relationship between DDM-PGE2 and GBM was also not investigated. Given the emergence of these small molecules drugs in silico, further studies that explore the potential effects of them on GBM are imperative and will contribute to the study on new therapeutic drugs for GBM.
Despite studies devoted to investigate the molecular mechanisms of GBM development, integrated studies based on multiple datasets are rare. In the present study, 10 hub genes were identi ed for the rst time in GBM by integrated bioinformatics analysis; meanwhile, the mRNA and protein expression levels of them were veri ed by using TCGA and HPA databases. Importantly, we also screened the putative therapeutic agents for GBM. This analysis comprehensively analyzed the pathogenesis of GBM, which provided certain guiding signi cance for the diagnosis and treatment of this disease. Although the clinical value of these genes and drugs in GBM has not been reported in previous study, the importance of them should not be underestimated.

Conclusions
In summary, with the integrated bioinformatics analysis of three GBM-related gene expression pro les, we identi ed 10 key genes connected with pathogenesis and prognosis of GBM. These hub genes might serve as novel diagnostic and treatment biomarkers of GBM, which might conduct to elucidate the molecular mechanism of the occurrence and progression of GBM. Additionally, a series of small molecule drugs which could reverse the abnormal gene expression of GBM were identi ed. Our work may provide powerful evidence for the genomic individualized treatment of GBM.   A ow chart of this study protocol.  The PPI network of survival related DEGs. The color depth of nodes represents the corrected P-value. The size of nodes represents the number of genes involved.

Figure 5
The expression level of hub genes according to the TCGA database. Blue box indicates normal tissue, and red box indicates GBM tissue.