Identification of biomarkers of chromophobe renal cell carcinoma by weighted gene co-expression network analysis

Background Chromophobe renal cell carcinoma (ChRCC) is the second common subtype of non-clear cell renal cell carcinoma (nccRCC), which accounting for 4–5% of renal cell carcinoma (RCC). However, there is no effective bio-marker to predict clinical outcomes of this malignant disease. Bioinformatic methods may provide a feasible potential to solve this problem. Methods In this study, differentially expressed genes (DEGs) of ChRCC samples on The Cancer Genome Atlas database were filtered out to construct co-expression modules by weighted gene co-expression network analysis and the key module were identified by calculating module-trait correlations. Functional analysis was performed on the key module and candidate hub genes were screened out by co-expression and MCODE analysis. Afterwards, real hub genes were filter out in an independent dataset GSE15641 and validated by survival analysis. Results Overall 2215 DEGs were screened out to construct eight co-expression modules. Brown module was identified as the key module for the highest correlations with pathologic stage, neoplasm status and survival status. 29 candidate hub genes were identified. GO and KEGG analysis demonstrated most candidate genes were enriched in mitotic cell cycle. Three real hub genes (SKA1, ERCC6L, GTSE-1) were selected out after mapping candidate genes to GSE15641 and two of them (SKA1, ERCC6L) were significantly related to overall survivals of ChRCC patients. Conclusions In summary, our findings identified molecular markers correlated with progression and prognosis of ChRCC, which might provide new implications for improving risk evaluation, therapeutic intervention, and prognosis prediction in ChRCC patients. Electronic supplementary material The online version of this article (10.1186/s12935-018-0703-z) contains supplementary material, which is available to authorized users.

WGCNA is a method commonly used to explore the complex relationships between genes and phenotypes. This method is able to transform gene expression data into co-expression modules and provide insights into signaling networks that may be responsible for phenotypic traits of interests [5][6][7]. WGCNA is widely used in various biological processes, such as cancer, neuroscience and genetic data analysis, which is quite helpful for the identification of potential biomarkers or therapeutic targets [8][9][10][11]. Not only can it analyses mRNA level of tumor samples, but also work on microRNA or lncRNA datasets of neoplasms to find candidate biomarkers for prognosis and treatment [12,13].
In this study, WGCNA method was firstly used to analyze clinic traits and gene expression data of ChRCC samples provided by TCGA database to identify key genes associated with tumor prognosis and progression. Our findings may be very beneficial to assess malignant potential of ChRCC and offer therapeutic methods to this neoplasm.

Data sources and data preprocessing
Gene expression data and patient clinic traits of ChRCC were downloaded from The Cancer Genome Atlas (TCGA) database (https ://cance rgeno me.nih.gov/). Annotation information of microarray was used to match probes with corresponding genes. The average value was calculated out for those genes corresponding to more than one probes, while probes matched with more than one gene were eliminated.

Screening for differentially expressed genes
The "DEseq2" R package was used to screen DEGs between ChRCC samples and paired normal tissues in the expression data. The DEGs threshold was set at a |log-2FoldChange| > 0.585 and adj.P.value < 0.05. After DEGs were screen out, flashClust tool package in R language was used to perform cluster analysis of ChRCC samples.

Co-expression network construction and module analysis
Firstly, expression data profile of DEGs was tested by to check if they were the good samples and good genes. Afterwards, power value was screened out by WGCNA [14] algorithm which is implemented in the R software package (http://www.r-proje ct.org/). Scale independence and average connectivity degree of modules with different power value was tested by gradient method (the power value ranging from 1 to 20). The appropriate power value was determined when the degree of independence was above 0.85 and average connectivity degree is relatively higher [8]. Once the power value was determined, the scale-free gene co-expression networks were constructed by WGCNA algorithm. Besides, the corresponding genes information of each module was extracted out. The minimum number of genes in each module was set as 50 for the high reliability of the results.

Interaction analysis of co-expression modules of ChRCC
After co-expression modules were identified by WGCNA algorithm. Heatmap was painted to describe the strength of the interactions between different modules by heatmap function embedded in R language.

Construct module-trait relationships of ChRCC and key module identification
The correlation between module eigengenes (MEs) and phenotype (clinic traits) was used to evaluated module-trait associations. MEs were considered as the major component in the principal component analysis for each gene module. We calculated the correlation between MEs and clinical trait to identify the relevant module. Gene significance (GS), which was defined as the log10 transformation of the P value (GS = lgP) in the linear regression between gene expression and clinical information, was calculated to evaluate correlation strength. Modules with the highest correlation coefficients among all modules were usually considered as the key module and selected for further analysis.

Functional enrichment analysis of the key module
The information of genes in key modules was upload to Enrichr online database to perform GO and KEGG pathway analysis [15,16]. Analysis results were extracted out under the condition of P < 0.05 after correction [17]. The top 10 GO terms were visualized if there were more than ten terms, so as to KEGG pathways.

Selection of candidate hub genes
Hub genes, with highly intramodular connectivity, have been shown to be functionally significant. In this study, candidate hub genes were defined by module connectivity, measured by module membership (MM) > 0.8 and clinical trait relationship, measured by significance of the Pearson's correlation (GS. Pathologic stage > 0.2) [18]. Furthermore, we uploaded all genes in the hub module to the STRING database, choosing confidence > 0.4 to construct protein-protein interaction (PPI). After setting degree cut-off = 5, node score cut-off = 0.2, k-core = 2, and max. depth = 100, the most significant sub-module was selected by using plug-in MCODE [9]. Genes both in co-expression network and MCODE sub-module were regarded as candidate hub genes for further analysis.

Hub genes identification and validation
Another independent dataset GSE15641 containing six ChRCC samples and 23 normal tissues was extracted and performed on GEO2R analysis (https :// www.ncbi.nlm.nih.gov/geo/). |log2FoldChange| > 1 and P.Value < 0.05 as criterion, DEGs were identified. Candidate hub genes were mapped to DEGs in GSE15641 to find the real hub genes. For further validation, real hub genes were upload to online TCGA database to perform survival analysis (http://gepia .cance r-pku.cn/). ChRCC patients were divided into two groups according to median expression of each hub gene (high vs. low) and Kaplan-Meier survival analysis was conducted.

Statistical analyses
Two-tailed Student's t-test was used for significance of differences between groups. Statistical analyses were performed with GraphPad Prism 6.02. Statistical significance was set at probability values of P < 0.05.

DEGs screening
The gene expression profile of 66 ChRCC samples were download from TCGA database. A total of 2215 DEGs (1741 up-regulated and 384 down-regulated) between ChRCC samples and normal tissues were screened out under the threshold of |log2FoldChange| > 0.585 and adj.P.Value < 0.05. These 2215 DEGs were then selected for subsequent analysis.

Construction of co-expression modules of ChRCC
All DEGs were included for constructing co-expression modules by WGCNA algorithm. The "flashClust tool" package was used to perform the cluster and trait analysis and results were shown in (Fig. 1). First of all, the appropriate power value was screened out (Additional Fig. 1 Clustering dendrogram of 66 tumor samples and heatmaps of clinical traits. The clustering was based on the expression data of DEGs between ChRCC samples and non-tumor samples. The color intensity in heatmaps was proportional to older age, higher pathological stage. In neoplasm and survival status, white means patient neoplasm free or live, while red means patient with tumor or dead file 1: Fig S1). When the power value was equal to 5, scale free networks were constructed with independence degree up to 0.85 and the relatively higher average connectivity (mean connectivity = 6.04). As a result, eight distinct gene co-expression modules were identified by the appropriate power value (5) in ChRCC after excluding the grey module. Constructed modules painted with different colors and cluster trees of DEGs were shown in (Fig. 2). Gene numbers and module names were shown in Table 1. Interactions between eight co-expression modules were subsequently analyzed. The heatmap demonstrates the Topological Overlap Matrix (TOM) among all genes in the analysis. Light color represents low overlap and darker red color represents higher overlap. As a result, each module showed independent validation to each other (Fig. 3).

Identification of key modules corresponding to clinic traits
The correlations between module eigengene and clinic traits were shown in (Fig. 4). The brown module was selected as key module for taking up top three highest correlations with progression and survival of ChRCC (R 2 = 0.48 and P = 5e −05 with pathologic stage, R 2 = 0.41 and P = 6e −04 with neoplasm status, R 2 = 0.52 and P = 8e −06 with living status). Afterwards, we plotted scatter plots of GS vs. MM (module membership) in the brown modules with clinic traits respectively (Additional file 2: Fig S2).

Functional enrichment analysis of genes in the key module
To obtain further insight into the function of genes in the hub module, all genes in the brown module were uploaded to Enrichr online database to conduct GO and KEGG pathway analysis. According to P-value of each term, top 10 biological process and KEGG pathways were extracted out (Additional file 3: Table S1 and Additional file 4: Table S2) and visualized. GO analysis results  showed genes in brown module significantly enriched in mitotic cell cycle transition, mitotic spindle assembly, mitotic spindle organization, regulation of cell cycle process, etc. KEGG pathway analysis revealed cell cycle, oocyte meiosis, progesterone-mediated oocyte maturation pathways were significantly enriched (Fig. 5). All these results implied that dysfunctional mitotic cell cycle may contribute to tumorigeneses of ChRCC.

Selection of candidate hub genes
Under the condition of MM > 0.8 and GS. Pathologic stage > 0.2, 39 genes in brown module were taken out (Fig. 6a). PPI network were constructed under the cutoff of confidence > 0.4, 32 genes were filtered out after MCODE process (Fig. 6b). 29 candidate genes identified both in co-expression network and MCODE sub-module were show in Table 2.

Identification and validation of hub genes
1794 DEGs (884 up-regulated, 910 down-regulate) were identified in GSE15641. All candidate genes were mapped to DEGs in GSE15641, four genes (SKA1, ERCC6L, POLQ, GTSE1) were found to be differentially expressed both in TCGA and GSE15641 datasets. While expressions of POLQ in two datasets were opposite (upregulated in TCGA, down-regulated in GSE15641). Thus, SKA1, ERCC6L and GTSE1 were regarded as the real hub genes in ChRCC by WGCNA analysis. Expressions of real hub genes between ChRCC samples and normal tissues were shown in (Fig. 7). Survival analysis of these genes showed over-expressions of SKA1 and ERCC6L are significantly related to shorter overall survival time (Fig. 8).

Discussion
ChRCC is an uncommon, but not rare, malignant disease representing ~ 5% of histologic spectrum of cancer arising from kidney [19]. However, patients with ChRCC have a relatively low risk of tumor progression and cancer-specific death [20], the clinical outcomes of target therapies between ChRCC and ccRCC were not significantly different in metastatic disease [21]. To the best of our knowledge, there are no effective treatments for patients with metastatic ChRCC or any preventions for tumor recurrence. Thus, it's important to get a better understanding of molecular mechanism of ChRCC and identify potential biomarkers to evaluate the biological behavior of this malignancy. WGCNA has many prominent advantages over other methods since the analysis explores associations between co-expression modules and clinic traits and the results had much higher reliability and biological significance [22]. In this study, a total of eight co-expression modules were constructed by the 2215 DEGs from the 66 human ChRCC samples by WGCNA method, which was used to detect the relationship between ChRCC transcriptome and clinic traits. We calculated the correlations between co-expression modules and pathologic stage, neoplasm status and living status. Brown module with the highest correlation with clinic traits were regarded as the key module to explore primary cause for disease progression. For further analysis, genes in brown module with high module membership (> 0.8) and significance of correlations with pathologic stage (> 0.2) were filtered out. Meanwhile, MCODE analysis was performed on brown module to find a panel of genes with high connective degrees [9]. 29 candidate hub genes were identified by overlapping results of co-expression analysis and MCODE analysis. GO and KEGG analysis showed that dysregulation of cell cycle may be the underlying mechanism of tumorigeneses of ChRCC.
29 candidate hub genes were subsequently validated in GSE15641 dataset, four candidate genes (SKA1, ERCC6L, POLQ, GTSE1) differentially expressed both in TCGA and GSE15641 database were considered as real hub genes. After excluding POLQ for conflicting expression level in two databases, two of three hub genes (SKA1, ERCC6L) were found to be significantly correlated with shorter overall survival time under their over expressions.
SKA1 was a microtubule-binding subcomplex of the outer kinetochore which is essential for proper chromosome segregation. It played an important role on tumorigenesis in multiple malignancies [23,24]. SKA1 over-expression led to cancerization in human prostate epithelial cells via the induction of centriole over-duplication [25]. Depletion of SKA1 inhibited cell proliferation in gastric cancer by blocking cell cycle in S phase [26]. What's more, SKA1 was proved to be an oncogene in ccRCC which could be down-regulated by antitumor miR-10a-5p transfection [27]. ERCC6L gene, also named PLK1-interacting checkpoint helicase (PICH), was a member of the SNF2 protein family (SWI/SNF catalytic subunit SNF2). It was a mitotic target and substrate of polo-like kinases (PLKs) which regulates multiple processes in mammalian cell mitosis [28]. Downregulation of ERCC6L decreased cell viability in RCC cell lines by blocking mitogen-activated protein kinase (MAPK) signaling pathway and interactions with protein PLK1 [29]. Protein encoded by GTSE-1 modulated cell migrations in an EB1-dependent manner. Up-regulation of GTSE1 expression could be associated with increased invasive potential in breast cancer [30]. Depletion of GTSE-1 enhanced mitotic centromere-associated kinesin (MCAK) activity in mitotic cells, leading to chromosomal instability (CIN) which was presented in most solid tumors [31]. Silencing GTSE-1 expression inhibits proliferation and invasion of hepatocellular carcinoma cells [32]. Based on many studies about three hub genes, we could find their important roles in tumorigenesis and metastasis. Targeting these biomarkers and other candidate hub genes participating in cell cycles could provide therapeutic potentials for ChRCC. Some limitations exiting in our studies should be mentioned. The most vital genes out of 29 candidate hub genes couldn't been filtered out under restrictions of the bioinformatics methods. Multi-central and large sample studies on ChRCC expression profiles were needed for validation of our findings. However, this couldn't be fulfilled due to limited online databases for ChRCC.

Conclusions
Our study used weighted gene co-expression analysis to construct a gene co-expression network, identify and validate the key module and hub genes associated with the progression and prognosis of ChRCC. Functional analysis of hub genes reveal mitotic cell cycle dysregulation could be the underlying mechanism of ChRCC. Three real hub genes including SKA1, ERCC6L and GTSE-1 were identified. SKA1 and ERCC6L were validated in association with poor prognosis of ChRCC. However, further molecular biological experiments are needed to confirm the function of these biomarkers in ChRCC.