Identification of methylation-driven genes related to the prognosis of papillary renal cell carcinoma: a study based on The Cancer Genome Atlas

Background Aberrant DNA methylation patterns are involved in the pathogenesis of papillary renal cell carcinoma (pRCC). This study aimed to investigate the potential of methylation-driven genes as biomarkers in determining the prognosis of pRCC by bioinformatics analysis. Methods DNA methylation and transcriptome profiling data were downloaded from The Cancer Genome Atlas database. Methylation-driven genes (MDGs) were obtained using MethylMix R package. A Cox regression model was used to screen for pRCC prognosis-related MDGs, and a linear risk model based on MDG methylation profiles was constructed. A combined methylation and gene expression survival analysis was performed to further explore the prognostic value of MDGs independently. Results A total of 31 MDGs were obtained. Univariate and multivariate Cox regression analysis identified eight genes (CASP1, CD68, HOXD3, HHLA2, HOXD9, HOXA10-AS, TMEM71, and PLA2G16), which were used to construct a predictive model associated with overall survival in pRCC patients. Combined DNA methylation and gene expression survival analysis revealed that C19orf33, GGT6, GIPC2, HHLA2, HOXD3, HSD17B14, PLA2G16, and TMEM71 were significantly associated with patients’ survival. Conclusion Through the analysis of MDGs in pRCC, this study identified potential biomarkers for precision treatment and prognosis prediction, and provided the basis for future research into the molecular mechanism of pRCC.


Background
Renal cell carcinoma (RCC), a common malignant tumor in the urinary system, accounts for 2-3% of human cancers [1]. Papillary renal cell carcinoma (pRCC) represents the second most common pathological subtype of RCC, accounting for 18.5% [2,3]. According to morphological characteristics, pRCC is usually divided into two main subtypes: type 1 and type 2 [4]. Studies have proved that type 2 pRCC tumors are more aggressive and have a worse prognosis than clear cell renal cell carcinoma (ccRCC) [5]. Compared with ccRCC patients, patients with metastatic pRCC usually have a worse prognosis [6]. Currently, pathological and clinical grading and staging still act as the main prognostic factor [7]. In recent years, with the development of molecular biology technology and bioinformatics, new biomarkers have the potential to be used as prognostic factors for different types of cancer, including pRCC. For instance, some studies have found

Open Access
Cancer Cell International *Correspondence: zryhhuang@163.com; 15120012660@163.com † Zeyu Liu and Yuxiang Wan contributed equally to the work 1 Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing 100029, China Full list of author information is available at the end of the article that CYP4A11 expression is a potential poor prognostic factor for RCC [8], while others have detected long noncoding RNAs that might serve as prognostic biomarkers for pRCC [9], and some alternative splicing events also correlate with pRCC prognosis [10]. Therefore, finding new molecular markers to predict the clinical outcome of pRCC is of great significance for understanding pRCC's molecular pathological characteristics, prognosis, and treatment.
Epigenetic modification plays an essential role in the development of human cancer [11]. Methylation, one of the most important epigenetic modifications, is involved in many cellular processes, including cell differentiation, genome stability, and gene imprinting [12], and it intimately relates with tumorigenesis [13]. Changes in DNA methylation can provide important evidence for the early diagnosis and prognosis of cancer, and offer new ideas for further clinical applications. Some previous studies have explored the relationships between methylationdriven genes and the prognosis of certain cancers, such as esophageal squamous cell carcinoma [14], lung adenocarcinoma [15], and bladder cancer [16], but currently, there is no such report for pRCC.
The Cancer Genome Atlas (TCGA) is an open-source database on cancer genes [17], from which researchers around the world can obtain and analyze relevant data. MethylMix, an algorithm based on R programming [18,19], can identify disease-specific hypermethylated and hypomethylated genes. In this study, we extracted methylation and RNA expression data of pRCC patients from the TCGA database, obtained the methylation-driven genes related to pRCC using the MethylMix algorithm, and then constructed a Cox survival prediction model to evaluate the prognosis of pRCC, aiming at finding independent prognostic biomarkers, and exploring the correlation between abnormal DNA methylation and pRCC genome level to provide the scientific basis for personalized diagnosis and treatment.

Data acquisition and analysis
We downloaded the methylation data and transcriptomics data from KIRPs in the TCGA database. Methylation data were obtained from 276 tumor samples and 45 adjacent non-tumor samples from the Illumina Human Methylation 450 k platform; transcriptome profile data were from 289 tumor samples and 32 adjacent nontumor samples without isoform expression and miRNA expression quantification. First, we normalized the downloaded data and analyzed the differences using the limma package of R software [20] to obtain differentially expressed genes (DEGs) and differentially methylated genes (DMGs), and then used the MethylMix package for integrated analysis. MethylMix is a statistical package of R software that can find methylation-driven genes by integrating DNA methylation data and RNA expression data. Using the MethylMix algorithm, we calculated the correlation between the gene methylation level and gene expression and obtained the methylation-driven genes after constructing a β-mixed model with a filtration of|logFC| > 0.5, P < 0.05, and Cor < − 0.3. TCGA data are publicly available without permission from the ethics committee.

Functional enrichment, pathway analysis, and genetic alteration analysis
After obtaining the methylation-driver genes, to further understand the biological functions of these selected genes, we conducted Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [21] using the clusterprofiler package of R software [22]. GO analysis includes the molecular function (MF), biological process (BP), and cellular component (CC) [23]. We set P < 0.05 as the cutoff and visualized the results using Goplot [24]. The cBio Cancer Genomics Portal (cBioPortal) is an essential online platform for analyzing cancer genomics data [25]. We used cBioPortal to investigate the genetic alterations of methylation-driven genes in pRCC patients (TCGA, PanCancer Atlas).

Prognostic model construction and survival analysis
To analyze the relationship between methylation-driven genes and prognosis, we conducted a survival analysis based on these methylation-driven genes via the survival package after combining the clinical data and prognosis of pRCC in TCGA. Only patients with complete survival information were included. We screened the prognosis-related methylation-driven genes with P < 0.05 and constructed a predictive model by multivariate Cox regression analysis. The risk score of each patient was calculated based on the model. The patients were separated into high-risk and low-risk groups after obtaining the median risk score, and the time-dependent receiver operating characteristic (ROC) curve was used for testing. The Kaplan-Meier survival curve was used to evaluate the overall survival rate of patients in high and low-risk groups. The log-rank test was used to determine whether the overall survival rate differed between the high-risk group and the low-risk group. P < 0.05 was considered statistically significant. In addition, we performed joint survival analysis of the methylation-driven gene level and gene expression level in pRCC patients to further identify key genes related to prognosis, and obtain the patient's joint survival curve via the survival R package.

Data acquisition and analysis
In the present study, the methylation data analyzed were collected from 321 samples, including 276 tumor samples and 45 non-tumor samples. Gene expression data were obtained from 321 samples, including 289 tumor samples and 32 non-tumor samples. Both DEGs and DMGs were analyzed using the limma package of R software (Additional file 1). The data were then merged using the MethylMix package. Altogether, 31 methylation-driven genes were obtained ( Table 1, Fig. 1). Figure 1 shows that four methylation-driven genes (ADAM28, AHNAK2, GIPC2, and GYPC) have significant negative correlations between methylation and gene expression levels.

Functional enrichment, pathway and genetic alteration analysis of methylation-driven genes
GO enrichment analysis revealed that methylationdriven genes were mainly enriched on multiple BP, MF, and CC terms (Fig. 2, Table 2). According to the P value from low to high, the top BP terms were cytolysis, the killing of cells of other organism, disruption of cells of other organism, and demethylation. In MF, the top terms were steroid dehydrogenase activity, anion transmembrane transporter activity, and oxidoreductase activity. In CC, Z disc, I band, and blood microparticle were the topranking terms. Pathway enrichment analysis showed that genes were closely related to the taurine and hypotaurine metabolism pathway and Salmonella infection pathway ( Table 3). The cBioPortal tool was used to analyze 31 methylated genes for genetic alterations. As shown  (Table 4). Among these genes, the methylation levels of CASP1, CD68, and HOXD3 negatively correlated with a risk score, and the remaining five positively correlated with a risk score. The Kaplan-Meier survival curve indicated a statistical difference in survival between the high and low-risk groups (Fig. 4). The ROC curve in a risk scoring model was used to evaluate its predictive performance. The area under the curve (AUC) of the prognostic risk assessment model for the eight methylation-driven genes was 0.835 at 3 years of overall survival (Fig. 5), suggesting the model is able to predict the prognostic risk in pRCC patients. Meanwhile, we also recorded patient's risk score, survival status, and methylation of eight genes (Fig. 6). After adjusting parameters, including age, gender, and pathological stage, the risk score was found to be an independent  predictor using multivariate Cox regression analysis (Fig. 7). Joint survival analysis of gene expression and methylation showed that methylation and expression of C19orf33, GGT6, GIPC2, HHLA2, HOXD3, HSD17B14, PLA2G16, and TMEM71 were significantly correlated with the prognosis of pRCC patients (Fig. 8).
Hypermethylation and low expression of HOXD3 and HSD17B14 indicate high survival, while hypermethylation and low expression of C19orf33, GGT6, GIPC2, HHLA2, PLA2G16, and TMEM71 suggests low survival.

Discussion
The molecular mechanism of pRCC tumorigenesis remains unclear. In-depth research on the molecular pathogenesis of pRCC, as well as early detection of prognostic markers and specific driven genes for the disease, is of great significance for improving patient prognosis and developing new drugs. Epigenetics refers to the phenomenon of heritable and reversible changes in gene expression caused by non-DNA sequence changes. Increasing studies show that epigenetic modification closely relates to the occurrence and development of cancer [26]. Because epigenetic changes are reversible, they have more potential as new therapeutic targets than mutations in the DNA sequence. Multiple studies demonstrate the correlation between epigenetic changes and tumor prognosis, which indicates that they may be a factor in predicting tumor prognosis. For instance, Zhang et al. [27] found that the reduced expression of ZNF671 caused by hypermethylation is associated with poor prognosis in multiple solid tumors. Gao et al. [28] established a predictive risk model assessing the prognosis of patients with lung squamous cell carcinoma by studying the abnormal methylation sites of key genes with a poor prognosis. Therefore, it is necessary to explore the relationship between pRCC-related epigenetic changes and the patient's prognosis.
In this study, we found DEGs and DMGs between tumor samples and adjacent non-tumor samples, and further identified prognostic biomarkers associated with methylation-driven genes. We analyzed methylation and gene expression using the MethylMix package and detected 31 methylation driven-genes. GO analysis indicates that the methylation-driven genes in pRCC are mainly enriched in steroid dehydrogenase activity, anion transmembrane transporter activity, and oxidoreductase activity in MF. In CC, these genes are enriched on the Z disc and I band. In addition, in BP, enrichment is primarily seen in the regulation of cytolysis and demethylation. Pathway enrichment shows that methylation-driven genes closely relate to taurine and hypotaurine metabolism. These terms reflect gene-to-gene interactions at the functional level and indicate that gene dysfunction may be due to abnormal DNA methylation in different samples.
To further investigate the relationship between methylation-driven genes and patients with pRCC, we analyzed the relationship between abnormal DNA methylation and patient's survival via the survival R package. Eight candidate genes were identified from 31 methylation-driven genes and a predictive risk model was constructed. Using this risk model, we successfully divided the pRCC samples into high-risk and low-risk groups. As shown in Fig. 4, the 4-year survival rate in the highsurvival group was nearly 90%, while that of the lowsurvival group was about 70%. Survival after 8 years in the high-survival group was about 70%, while survival in the low-survival group was only about 40%. Survival analysis showed a significant difference in overall survival between the two groups. The results show that a risk model consisting of eight methylation-driven genes can effectively predict the prognosis of patients with pRCC. Furthermore, the results also showed that the AUC of the ROC curve of eight gene signatures predicting 3-year survival was 0.835. Multivariate Cox analysis showed that the model's risk score might be used as an independent prognostic factor for pRCC. We found good performance in the pRCC patients' survival prediction using the risk assessment model constructed by the eight gene signatures, but further research is necessary to validate these findings. These methylation-driven genes can serve as effective biomarkers or drug targets for early diagnosis and prognosis of pRCC patients. However, the influence of abnormal methylation data on patient survival is not comprehensive. Therefore, we combined methylationdriven genes and corresponding gene expression data with patient survival for an integrated analysis. The results found that C19orf33, GGT6, GIPC2, HHLA2, HOXD3, HSD17B14, PLA2G16, and TMEM71 significantly correlated with prognosis. Hypermethylation and low expression of HOXD3 and HSD17B14 led to high survival, while hypermethylation and low expression of C19orf33, GGT6, GIPC2, HHLA2, PLA2G16, and TMEM71 led to low survival.
HHLA2 is involved in the regulation of T cells [29,30]. Byers et al. [31] found that HHLA2 expression is downregulated or deleted in pancreatic cancer tissues, which may contribute to the immune escape of pancreatic cancer. In a colorectal cancer study [32], HHLA2 expression in cancer tissues was higher than in adjacent tissues, and the expression level was significantly associated with the depth of invasion and CD8 + T cell infiltration status and could predict high mortality rates. This may be related to the specificity of different tumors. Our study found that the hypermethylation of HHLA2 leads to a poor prognosis. We speculate that Fig. 3 The genetic alteration of 31 genes in pRCC patients using the cBioPortal database the low expression caused by HHLA2 hypermethylation mediates the immune escape of pRCC. HOXD3 mainly plays a role in regulating cell proliferation and differentiation in the body [33,34]. Other studies observed that HOXD3 overexpression serves as an independent risk factor for poor prognosis of breast cancer [35]. Joint survival analysis showed that overexpression of HOXD3 was associated with poor prognosis in patients with pRCC. This may occur because HOXD3 can regulate the downstream transfer of related molecules and adhesion molecules, such as integrin β3, urokinase plasminogen activator, and matrix metalloproteinases, and subsequently induce tumor angiogenesis, regulate tumor cell apoptosis, and enhance tumor cell invasion and metastasis [36]. PLA2G16 is a tumor suppressor gene that can induce programmed cell death [37] and inhibit cell migration and invasion [38]. Jarrard et al. [39] observed that PLA2G16 methylation in urine and prostate tissues can detect the presence of prostate cancer, indicating that downregulation of the PLA2G16 gene may play an important role in multifocal prostate carcinogenesis. In our study, we observed that a high level of PLA2G16 gene methylation was associated with low survival in patients with pRCC.
A study by Wang et al. [40] showed that TMEM71 acts as an oncogene in glioblastoma multiforme and is associated with an immune response. Joint survival analysis observed that hypermethylation and low expression of TMEM71 were associated with poor prognosis in patients with pRCC. CASP1 belongs to the cysteinyl aspartate-specific proteases family and plays an important role in the processes of inflammatory response [41] and cell pyroptosis [42]. In the risk model, the methylation level of CASP1 was negatively correlated with the risk score, indicating that the high expression of CASP1 was a high-risk factor, which might be related to CASP1's activation of IL-1β, inducing inflammation and immunosuppression to promote tumor growth and metastasis [43]. CD68 is a widely used macrophage marker that can be used to assess the extent of macrophage infiltration in tumor tissue [44]. The risk model showed that the expression level of CD68 was positively correlated with the risk score, which was consistent with the current literature reports [45], suggesting that tumor-associated macrophages might be involved in tumor progression in pRCC.
Currently, there are no studies on the role of HOXA10-AS in pRCC. Some studies have shown that high expression of HOXA10-AS in lung adenocarcinoma [46] and glioma [47] can promote tumor progression. According to the risk model, hypermethylation and low expression of HOXA10-AS can increase patient survival risk, which may be related to the heterogeneity of different   [48] found that promoter methylation of HOXD9 was significantly higher in thymic carcinoma than in thymoma and the thymus, and relapse-free survival was significantly worse in tumors with a higher DNA methylation of HOXD9 in all of the thymic epithelial tumors. Similar to the results of this study, hypermethylation of HOXD9 was positively correlated with high risk score. However, the specific role of HOXD9 methylation in tumor progression needs to be further studied. These genes may become new targets for treating and improving the prognosis of patients with pRCC.
Currently, few studies on pRCC abnormal methylation genes have been reported. Compared with previous studies, we conducted a more comprehensive analysis of the methylation-driven genes in pRCC using the Methyl-Mix algorithm. In addition, we analyzed the correlation between abnormal methylation sites and gene expression, providing accurate targets for further experimental verification. Our research subjects were retrieved from the TCGA database, which is an important tool for analyzing prognostic biomarkers. Although we have investigated the relationship between epigenetic changes and pRCC, the prediction of gene signatures for prognosis still needs to be verified by molecular biology experiments based on clinical samples in the future, and its specific molecular mechanism in pRCC requires further testing.

Conclusion
Using the transcriptome profile data and genomic methylation data of pRCC patients in the TCGA database, this study obtained 31 pRCC-related methylation-driven genes and constructed a prognostic survival model with eight methylation-driven genes (CASP1, CD68, HOXD3, HHLA2, HOXD9, HOXA10-AS, TMEM71, and PLA2G16). These eight genes may be involved in various processes related to pRCC, such as immune escape, inflammatory response, apoptosis and cell migration. Additionally, these genes have the potential to be markers or drug targets for early diagnosis and protocol