Combined analysis and validation for DNA methylation and gene expression profiles associated with prostate cancer

Background Prostate cancer (PCa) is a malignancy cause of cancer deaths and frequently diagnosed in male. This study aimed to identify tumor suppressor genes, hub genes and their pathways by combined bioinformatics analysis. Methods A combined analysis method was used for two types of microarray datasets (DNA methylation and gene expression profiles) from the Gene Expression Omnibus (GEO). Differentially methylated genes (DMGs) were identified by the R package minfi and differentially expressed genes (DEGs) were screened out via the R package limma. A total of 4451 DMGs and 1509 DEGs, identified with nine overlaps between DMGs, DEGs and tumor suppressor genes, were screened for candidate tumor suppressor genes. All these nine candidate tumor suppressor genes were validated by TCGA (The Cancer Genome Atlas) database and Oncomine database. And then, the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed by DAVID (Database for Annotation, Visualization and Integrated Discovery) database. Protein–protein interaction (PPI) network was constructed by STRING and visualized in Cytoscape. At last, Kaplan–Meier analysis was performed to validate these genes. Results The candidate tumor suppressor genes were IKZF1, PPM1A, FBP1, SMCHD1, ALPL, CASP5, PYHIN1, DAPK1 and CASP8. By validation in TCGA database, PPM1A, DAPK1, FBP1, PYHIN1, ALPL and SMCHD1 were significant. The hub genes were FGFR1, FGF13 and CCND1. These hub genes were identified from the PPI network, and sub-networks revealed by these genes were involved in significant pathways. Conclusion In summary, the study indicated that the combined analysis for identifying target genes with PCa by bioinformatics tools promote our understanding of the molecular mechanisms and underlying the development of PCa. And the hub genes might serve as molecular targets and diagnostic biomarkers for precise diagnosis and treatment of PCa. Electronic supplementary material The online version of this article (10.1186/s12935-019-0753-x) contains supplementary material, which is available to authorized users.


Background
DNA methylation, one of the most important epigenetic factors, has been studied extensively over several decades, and its influence in a variety of human diseases, most malignancy tumor, diagnostic biomarkers and therapeutic targets, has been firmly measured and evaluated [1][2][3][4][5][6]. Recently epigenetic study has identified that DNA methylation is an important biological mechanism for tumor occur and development [5]. In CpG islands, aberrant methylation could influence the functions of tumor suppressor genes by altering their expression levels. CpG islands are located in or near promoter regions of the genome, aberrant methylation genes in CpG islands are often hypermethylated and may cause silencing of tumor suppressor genes [7]. Variations of DNA methylation exist at CpG islands, including gene hypermethylationlow regulation and hypomethylation-high regulation.
Several studies that have done to investigate DNA methylation in gene body has positively correlated with

Open Access
Cancer Cell International *Correspondence: dengshixiong1963@163.com 1 Laboratory of Forensic Medicine and Biomedical Informatics, Chongqing Medical University, Chongqing 400016, People's Republic of China Full list of author information is available at the end of the article gene expression by increasing transcription activity [8]. This may be caused by blocking the intragenic promoter activity or affecting the methylation status of repetitive sequence within the transcription unit [9]. So DNA methylation in gene body may be an interesting additional therapy target for cancer diagnosing and treatment.
Prostate cancer (PCa) is the second most frequently diagnosed male-specific malignancy tumor in western countries. According to the World Health Organization's International Agency for Research on Cancer, 1.1 million men were diagnosed with prostate cancer worldwide in 2012, accounting for 15% of all cancer diagnosed in men [6]. PCa is considered as a heterogeneous disease [10]. Accumulating evidence has also demonstrated that multiple genes and cellular pathways participate together in the occurrence and development of PCa. Tumor led by epigenetic mutation through cells may grow and reproduce uncontrollably [11]. Hypomethylation of CpG can cause chromosome instability [12]. Illumina Infinium 450 k microarray and DNA microarray have utilized to investigate DNA methylation and gene expression in molecular mechanism, biological process, molecular diagnosis, tumor molecular, biomarker and drug targets discovery [13][14][15].
Many gene expression profiling analysis and aberrant methylation analysis were introduced for differentially expressed genes (DEGs) and differentially methylated genes (DMGs) [16]. However, separated analysis of DEGs and DMGs are limited [17,18]. So it is necessary to make jointly analyze for both gene expression profiling microarray and gene methylation profiling microarray in PCa. In this study, gene methylation profiling datasets and gene expression profiling datasets were analyzed by bioinformatics tools for screening the DMGs and DEGs. Later, the overlapping of hypermethylation genes, downregulated genes and tumor suppressor genes were used to identify the candidate tumor suppressor genes. For validating these candidate tumor suppressor genes, TCGA database was used to identify the CpG islands and Oncomine database was used to validate the analyzing result. At last, the biological functions and pathways analyzing were discussed for the molecular mechanism.

Datasets
In this study, the gene methylation profiling datasets and gene expression profiling datasets were downloaded from Gene Expression Omnibus (https ://www.ncbi.nlm.nih. gov/geo) database. All these gene methylation profiling datasets were based on GPL13534 platform (Illumina HumanMethylation450 BeadChip). For the gene expression profiling datasets were based on GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array). For each dataset, only the samples associated with PCa were selected (Table 1).

Data preprocessing and analyzing
R package affy was used to explore oligonucleotide array analysis by the robust multiarray average (RMA) algorithm [19]. While R package methylumiIlluminaHuman-Methylation450kmanifest, limma, minfi, watermelon and IlluminaHumanMethylation450kanno.ilmn12.hg19 were used to analyze the gene methylation profiling data and gene expression profiling data in order to identify DMGs and DEGs. And the R package limma in RStudio 1.1.453 was installed to identify genes that were differentially expressed between normal and tumor samples. The Benjamini and Hochberg (BH) procedure were obtained to control the False Discovery Rate (FDR) [20]. Then the log 2 -fold change (log 2 FC) was calculated. The adjusted P value < 0.05 and |log2FC| > 2.0 were considered as the cutoff value for DMGs and DEGs screening.

Functional and pathway enrichment analysis of DMGs and DEGs
In order to analyze the DMGs and DEGs for the functional enrichment, GO enrichment and KEGG pathway analysis were performed using DAVID (https :// david .ncifc rf.gov). In this paper, DMGs and DEGs were uploaded to online analysis tool to systematically investigate biological meanings behind these genes [21]. Gene ontology analysis (GO) is a common useful method for annotating genes for identifying biological process (BP), cellular component (CC) and molecular function (MF) [22,23]. At the same time, Kyoto Encyclopedia of Genomes (http://www.genom e.jp) pathway enrichment  19:50 analysis was conducted for candidate genes and prostate specific antigen (PSA) [24].

Protein-protein interaction (PPI) network construction and module analysis
STRING database (https ://strin g-db.org) was used for protein-protein interaction (PPI) analysis in order to investigate the molecular mechanisms. STRING database (version 10.5) covers 9,643,763 proteins from 2031 organisms. While network analysis is a useful method for uncovering all kinds of protein-protein interactions networks. It can measure networks by nodes, edges, degrees and network structures, so it can help us to identify hub genes and key protein community. Then, PPI networks were constructed by Cytoscape software (http://www.cytos cape.org). Cytoscape is an open source software platform for visualizing molecular interaction networks, biological pathways and integrating these networks with annotations, gene expression profiles and other state data. A plugin named cytoHubba was introduced to screen hub genes of PPI network in Cytoscape. CytoHubba can provide 12 topological analysis methods including Betweenness, BottleNeck, Closeness, Clustering Coefficient, Degree, DMNC, EcCentricity, EPC, MCC, MNC, Radiality and Stress based on shortest paths [25].

Normalization of gene methylation profiling and gene expression profiling
In this study, the five gene expression profiling dataset ( Fig. 1) and four gene methylation profiling dataset ( Fig. 2) were separately analyzed by R package affy and online GEO2R (https ://www.ncbi.nlm.nih.gov/geo/geo2r /) for screening DEGs and DMGs. All the gene expression profiling microarray chips were based on affymetrix GPL 570 platform. On affymetrix arrays, genes are represented by one or more probe sets, which are short oligonucleotides covering distinct sections of the gene synthesised in place through photolithography [26].
While on Illumina BeadChips arrays, 50 base pairs Infinium methylation probes synthesised by bisulfite conversion of unmethylated cytosines, which are randomly dispersed over the array [27]. This random allocation means that each probes was represented a random number of times on each array.
Microarrays and bead chips technologies depend on a complicated set of reagents and hardware, along with highly trained personnel, to produce accurate measurements. Both biological and non-biological factors will affect the results during the experiment when a series of complicated set of reagents and hardware varied [28]. So batch effects in different microarrays and bead chips should be focused after high-throughput experiments. Batch effects may occur at different laboratories, seasons and days. In order to eliminate batch effect problem, the surrogate variable analysis was conducted to reduce the batch effects depend on R package sva (Fig. 3). The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. The value distribution of methylation profiling data and the normalization of gene methylation profiling data were shown in Fig. 4.

Identification of DEGs and DMGs in PCa
After data normalizing and removing batch effects, total of 1331 DEGs were identified, using P < 0.05 and log 2 -fold change |log2FC| > 2.0 criteria, of which, 938 up-regulated genes and 393 down-regulated genes (Fig. 5). For DMGs, total of 3261 DMGs were identified according P < 0.05, log 2 -fold change |log2FC| > 2.0 and fwer < 0.5 criteria, of which, 2699 hypermethylation genes and 562 hypomethylation genes. The result has been shown as the volcano plot in Fig. 6.

Aberrantly methylated-differentially expressed genes in PCa
Then, totally 62 hypermethylation-low genes were obtained by overlapping 2699 hypermethylation genes and 393 down-regulated genes in Fig. 7. And 41 hypomethylation-high genes were obtained by overlapping 362 hypomethylation genes and 938 up-regulated genes in Fig. 7.

Integrating dataset for screening candidate tumor suppressor genes
Then, totally 9 candidate tumor suppressor genes were obtained by overlapping 2699 hypermethylation genes, 393 down-regulated genes and 1217 human tumor suppressor genes [30] in Fig. 8.
The heat map of nine candidate tumor suppressor genes (IKZF1, PPM1A, FBP1, SMCHD1, ALPL, CASP5, PYHIN1, DAPK1 and CASP8) was shown as Fig. 9. Although these tumor suppressor genes were hypermethylation-low genes, these genes were not significant in some samples for TCGA database.

Validation of candidate tumor suppressor genes in TCGA database
To further investigate the candidate tumor suppressor genes, TCGA database was used to validate these results. The outcome was shown as Fig. 10. The candidate tumor suppressor genes were separately significant in tumor group and normal group except IKZF1, CASP5 and CASP8, which significantly indicate that PPM1A, DAPK1 and FBP1 were hypermethylation in tumor samples, at the same time, PYH1N1, ALPL and SMCHD1 were hypermethylation in normal samples. And then, in order to further confirm these candidate tumor suppressor genes the MethPrimer [31] and cpgplot software were used to predict the CpG islands in Figs. 11 and 12. At the same time, pan-cancer analysis also shows that PPM1A, DAPK1, FBP1, PYHIN1, ALPL and SMCHD1 have significant amplification in PCa.

Integrated Oncomine database and survival analysis for candidate tumor suppressor genes
To confirm the candidate tumor suppressor genes expression between tumor and normal tissues in multiple cancers, the Oncomine database was performed to analyze the different expression. Using P < 0.01 and |log2FC| > 1.5 criteria, a total of 455, 455, 444, 398, 341, 407, 342, 453 and 398 unique analyses for IKZF1, PPM1A, FBP1, SMCHD1, ALPL, CASP5, PYHIN1, DAPK1 and CASP8 were shown in Fig. 13. In 80 studies, FBP1 was ranked by the top 10% of gene rank indicating significant statistical differences, 6 of which revealed higher expression level in tumor than normal tissues. For higher expression of CASP5, there was only one dataset listed. Up-regulated KAPK1 was founded in cancers based on sixty-one significant analyses.
To further confirm the candidate tumor suppressor genes, Kaplan-Meier analysis was performed in Fig. 14. All the low expression values of tumor suppressor genes are all significantly associated with poor prognosis while a high expression of tumor suppressor genes are associated with good prognosis.

Gene ontology enrichment analysis and KEGG pathway analysis
In order to uncover these genes affected by altered DNA methylation status, all hypermethylation-low expression genes and hypomethylation-high expression genes were uploaded to DAVID database to identify overrepresented BP categories and KEGG pathways. The top 10 significant GO enrichments of biological processes were illustrated in Table 3.
KEGG pathways analysis were also significantly enriched in PCa genes in Table 4. And the relationships between these pathways were shown in Fig. 16. Furthermore, GSEA (Gene Set Enrichment Analysis) was conducted to verify KEGG enrichment analysis in Fig. 16. All of these pathways were significantly enriched in Pathway in cancer, Wnt signaling pathway, Pancreatic cancer and Melanoma (Fig. 17). And most of these genes were upregulated, which show a credible knowledge of PCa.

PPI network construction and module analysis
All hypermethylation-low expressed genes and hypomethylation-high expressed genes were uploaded to the STRING database for constructing PPI network. Moreover, all these nodes and edges were analyzed using Cytoscape plug-ins called cytoHubba. For hypermethylation-low expressed genes, PPI network was shown in Fig. 18a and top modules were displayed in Fig. 18b. Hub genes were CASP1, STAT4 and IRF7.
For hypomethylation-high expressed genes, PPI network was shown in Fig. 19a and top modules were displayed in Fig. 19b. Hub genes were FGFR1, FGF13 and CCND1. Biological process of significant hub genes were demonstrated in Table 5.

Interlink between candidate genes with prostate specific antigen (PSA)
Prostate specific antigen (PAS) as a biomarker was discovered for diagnosing being and malignant prostate disease in 1960 [34]. Later, cumulative research has uncovered the antigens in the prostate and semen [35][36][37][38][39][40][41][42][43]. Prostate specific antigen is a protein expressed by multiple non-prostatic tissues in men and women. To obtain the relationship between candidate genes with PSA, a combined bioinformatics tools were utilized. Firstly, the Universal Protein Resource (UniProt) was used to get the PAS amino acid sequence (Additional file 1). Secondly, the PAS amino acid sequence and candidate genes were uploaded to BlastKOALA (https :// www.kegg.jp/blast koala /) for genome annotation. Finally, all the annotated genes were uploaded to KEGG PATH-WAY Database (https ://www.kegg.jp/kegg/pathw ay.html) for mapping pathway in Fig. 20. CASP8, CCND1, DAPK1 and PSA are involved in pathways in cancer. CASP8 (Caspase-8) paly essential role in apoptosis [44]. Apoptosis is regulated by CASP8 [45]. Some clinical study has confirmed that overexpression of CCND1 (Cyclin D1) is a common biomarker for treatment [46] and being ignored to cisplatin resistance in prostate cancer [47,48]. Death associated protein kinase 1 (DAPK1) plays a critical role  in apoptosis. The methylation of DAPK1 has interlink with cancer.

Discussion
Since microarray and high-throughput sequencing can provide expression levels of thousands of genes in human genome simultaneously, it has been widely used to predict the potential therapeutic targets for PCa. In this study, we conducted a combined analysis of two types of microarray chips (DNA methylation and gene expression profile datasets) of PCa for uncovering the epigenetic and genetic mechanisms in PCa using bioinformatics analysis tools. By overlapping DEGs, DMGs and TSGs (tumor suppressor genes), we identified some candidate tumor suppressor genes that can provide new ideas for diagnosis, therapy and biomarker studies in PCa. In order to better understand the molecular mechanism of candidate tumor suppressor genes and hub genes, GO, KEGG pathway and PPI analysis were further performed. The results demonstrated that these hypermethylationlow expressed genes were enriched in cytokine-mediated signaling pathway, benzene-containing compound Fig. 7 Venn diagram [29] for aberrantly methylated-differentially expressed genes by overlapping gene expression datasets and gene methylation datasets (a hypermethylation and down-regulated genes; b hypomethylation and up-regulated genes) Fig. 8 Venn diagram [29] for candidate tumor suppressor genes by overlapping hypermethylation genes, down-regulated genes and tumor suppressor genes metabolic process, kynurenine metabolic process, positive regulation of cysteine-type endopeptidase activity involved in apoptotic process, indolalkylamine catabolic process, response to vitamin, cellular response to mechanical stimulus, tryptophan catabolic process, tryptophan metabolic process and positive regulation of transcription, DNA-templated.
This indicated that cytokines is associated with prostate cancer. Some evidence shows the role of cytokines in prostate carcinogenesis. Cytokines are widely recognized as crucial factors in cancer development. In the progress of cancer development, cytokines were released by other immune cells. And cytokines enhance therapeutic resistances through EMT activation in tumor microenvironment [49]. Kynurenine has relationship with immune escape of tumor cells [50]. Srekumar et al. found that Kynurenine was associated with prostate cancer (PCa) progression [51]. More and more experimental researches have indicated that vitamin D has the effects of anti-prostate tumor [52,53]. Like vitamin D, vitamin C was also natural product that has the property of antioxidant [54]. So vitamin C also plays an important role for antitumor due to characterize of antioxidant [55].
Furthermore, the enriched KEGG pathways of hypermethylation-low expressed genes have significant enrichment in NOD-like receptor signaling pathway, Pentose phosphate pathway, Viral carcinogenesis, Tryptophan metabolism, Measles_Homo sapiens, Hepatitis B, Legionellosis, Cytosolic DNA-sensing pathway and Influenza A and Inflammatory bowel disease (IBD). Cumulative evidence revealed that the pentose phosphate pathway (PPP) is a metabolic pathway, parallel to glycolysis that generates NADPH, nucleotides and nucleic acids [56]. Heritable factors and environmental factors may cause prostate cancer together. Recent research focus on the role of viral infections in prostate cancer [57]. Cytosolic DNA sensing is associated to the secretion of cytokines [58]. Cytosolic DNA sensing mediates robust antimicrobial. Cancer cells often acquire genetic or epigenetic alterations [59,60]. Cytosolic DNA sensing is important for tumor control. In cytosolic DNA sensing, transcriptional and post-translational signaling modules enable the release of immunomodulatory cytokines [58].
After constructing PPI network for hypomethylation-high expression genes, a novel Cytoscape plugin called cytoHubba was introduced for ranking nodes in a  [23]. The hub genes appeared to be FGFR1, FGF13 AND CCND1.
Type 1 fibroblast growth factor receptor binding (FGFR1) was an important factor in tumor initiation and progression in prostate cancer. The activation of FGFR1 regulate the EMT in cancer progression [61]. Amplification of FGFR1 has been well studied [62,63]. And epithelial mesenchymal-transition (EMT) may play a crucial Fig. 11 Pan-cancer analysis of PCa-related candidate tumor suppressor genes in normal samples role during in tumor metastasis and progression [64]. Recent studies have indicated that inducing EMT can affect tumor microenvironment [65].
Fibroblast growth factor 13 (FGF13) is overexpressed in several types of cancer [66,67]. FGF13 inhibit ribosomal RNA synthesis, and may be an interplay with p53 involving a nucleolus-dependent mechanism [68]. It is possible that FGF13 may similarly signal to p53 by the same underlying molecular mechanism. FGF13 expression would activate p53, whereas the up-regulation of miR-504 would antagonize such an effect. It will be important to explore this issue directly in future studies [69]. Although the augmented FGF13 expression in tumors is unlikely to be a cancer driver, it is not merely a passenger, because it allows the cancer cells to cope with undesirable side effects of oncogene activation [68]. So FGF 13 may be reviewed as a cancer switch [70]. Cycline D1 (CCND1) plays a significant role in cell cycle. The over-expression of CCND1 in human tumors has been indicated as proto-oncogenes [71][72][73][74][75][76][77].
For tumor suppressor genes, IKAROS family zinc finger 1 (IKZF1) is a key regulator factor that enhanced immune infiltrate recruitment and tumor sensitivity in several tumors. Overexpression of IKZF1 can activate autoimmune susceptibility via infiltrating NKG2D + , CD8 + T cells [78]. Protein phosphatase, Mg2+/Mn2+ dependent 1A (PPM1A) is a phosphatase that has been the function of dephosphorylating TGF-β-activated P-Smad2/3, p38 and regulating several tumor-related signaling pathways [79,80]. As a phosphatase PPM1A plays a significant role in cell cycle progression, cell proliferation, and apoptosis [81][82][83]. Previous study indicated that metastatic prostate cancer had lower PPM1A expression compared with primary tumor [84]. Overexpression of PPM1A has been reported to activate the expression of tumor suppressor gene TP53 [82] and increased PPM1A expression inhibited the activity of NF-kB in promoting prostate cancer  Fig. 13 The landscape of the candidate tumor suppressor genes in human cancers (the more intense red indicates over-expression; the more intense blue indicates under-expressed gene.) invasion and metastasis [85]. Fructose-bisphosphatase 1 (FBP1) palys negative regulation roles in glycolysis and affects some process of survival, proliferation and metastasis in tumor cells [86,87]. Some studies also showed that overexpressed FBP1 in prostate cancers can be as a tumor biomarker [88,89]. Structural maintenance of chromosomes flexible hinge domain containing 1 (SMCHD1) is a chromatin protein associated with epigenetic modifier [90]. Alkaline phosphatase (ALPL) is a non-specific hydrolase and plays crucial role in regulating phosphate metabolism involved in cell cycle, growth, apoptosis and signal transduction pathways [91]. The abnormal expression of ALP can be a biomarker for prostate cancer [92]. The expression of caspase-5 (CACP5) is very low in many normal tissues, while its expression plays an important role in cell apoptosis [93]. Pyrin and HIN domain family member 1 (PYHIN1) has the function of tumor suppression [94]. Death associated protein kinase 1 (DAPK1) is a kinase regulated neuronal apoptosis by calcium-calmodulin. So far, a large number of studies have demonstrated that DAPK1 acts as a positive mediator interact with several proteins, such as DAPK1-p53 signaling, DAPK1-tau signaling, and DAPK1-DANGER signaling [95][96][97][98]. Caspase-8 (CASP8) acts as an environment sensor in the control of cell death. CASP8 induced a wide range of biological process, such as the control of apoptosis and necroptosis [99].
All the genes and pathways in this investigation are all based on bioinformatics methods. So no clinical samples were used to validate the data. But a shot literature review was introduced to confirm out findings.

Conclusions
In this study, a joint bioinformatics analysis method was used for indicating candidate tumor suppressor genes and pathways in PCa by combined gene methylation microarrays and gene expression microarray, this may provide a set of useful targets for future investigation into the molecular mechanisms and biomarkers. Candidate tumor suppressor genes were IKZF1, PPM1A, FBP1, SMCHD1, ALPL, CASP5, PYHIN1, DAPK1 and CASP8. Hub genes were FGFR1, FGF13 and CCND1 that may contribute to the finding of molecular mechanisms underlying the initiation and  Regulation of cardiac muscle cell action potential involved in regulation of contraction 1.35E−03