Identification of an immune gene signature with prognostic value for patients with uterine corpus endometrial carcinoma

Background: Uterine corpus endometrial carcinoma (UCEC) is a very common gynecological malignancy with a poor prognosis in the late stage. Therefore, the purpose of this study was to determine an immune-related gene signature that predicts the patients’ OS for UCEC. Methods: Based on TCGA, ImmPort, and Cistrome databases, the differential immune genes were screened and the TFs regulatory network was constructed. Functional enrichment and pathway analysis of differential immune genes were carried out. Prognostic value of 410 immune genes was analyzed by Cox regression analysis, a prognostic model was constructed, ROC curves were used to verify the accuracy of the model, and independent prognostic analysis was performed. Finally, the immune cell content was obtained by TIMER, and the correlation with the immune gene expression was evaluated by univariate Cox regression analysis. Results: It was found that the immune cell microenvironment and PI3K-Akt, MARK signaling pathways were involved in the development of UCEC. Based on the established prognostic model, ten-gene prognosis signature (PDIA3, LTA, PSMC4, TNF, SBDS, HDGF, HTR3E, NR3C1, PGR, CBLC) for UCEC prognostic prediction were finally identified, and our study has shown that risk-score can be a powerful prognostic factor for UCEC, independent of other clinical factors. The levels of B cells and neutrophils may be significantly correlated with the patient's risk score. Conclusions: Our studies showed that the ten-gene prognosis signature had important clinical value for the prognosis of UCEC, which was helpful for individualized treatment and provided a new target for tumor immunotherapy.


Background
Uterine corpus endometrial carcinoma (UCEC) is one of the most common malignant tumors in women, and the incidence rate is about 20%-30% of female reproductive system tumors, and the trend is younger [1]. Despite recent advances in treatment, morbidity and mortality are increasing year by year due to the lack of early diagnosis and predictive biomarkers [2]. The Grade, Stage, and TNM staging of UCEC are closely related to prognosis. However, even for those patients with the same staging still have completely different clinical outcomes, which means that the clinical prognostic information provided by traditional clinical-pathological staging is insufficient. Therefore, finding new 3 predictive markers is the key to improving the prognosis and prolonging the overall survival of patients with UCEC.
Recent studies have found that stromal cells in the tumor microenvironment (TME) play an important influence in tumor proliferation, invasion, and metastasis, and are closely related to the prognosis of the disease [3,4]. Host immune responses with multiple immune cell infiltrations are one of the main participants in TME [5,6]. Studies have suggested that UCEC may be associated with long-term inflammatory stimuli, suggesting that the endometrium and menstrual cycle are essentially a chronic inflammatory process in which immune cells are involved [7,8]. The research on the effect of cytokines secreted by immune cells on the survival outcome of UCEC patients has made great progress [9][10][11]. However, there is no systematic prediction of overall survival or response to immunotherapy in UCEC patients based on immune-related genes.
At present, with the rapid development of whole-genome sequencing, microarray analysis and wholegenome sequencing help to screen tumor-related prognostic genes, which are crucial for the selection of personalized cancer treatment decisions. Among them, the open-access of the Cancer Genome Atlas (TCGA) database makes larger cohort research possible [12]. Our study was based on RNA-seq gene expression data from the TCGA and a list of immune-related genes downloaded from the Immunology Database and Analysis Portal (ImmPort) database to develop prognostic immunomarkers and construct prognostic predictions model. Subsequently, we assessed whether these immune genes were associated with survival outcomes and clinical traits in a subgroup of UCEC patients. Finally, based on the abundance of six immune infiltrates from the Tumor Immune Estimation Resource (TIMER) database, we sought to explore the relationship between risk score for UCEC patients and the immune cell infiltration.

Methods
Download and organize data to obtain differential immune genes and transcription factors UCEC RNA-seq gene expression profiles and clinical data were downloaded from TCGA (https://portal.gdc.cancer.gov), a total of 575 patients were finally enrolled including 552 cancer tissue samples and 23 paracancerous tissue samples. Fig.1 shows the flow diagram of this study. 4 Differential expression gene analysis was performed using the "limma" package, corrected P < 0.05 and | logFC | ≥ 1 were considered statistically significant, and heatmaps and volcano plots were plotted using R version 3.5. 3 [13]. Immune-related genes and tumor-associated transcription factors were obtained from the IMMPORT website (http://www.immport.org/) and the Cistrome website (http://cistrome.org/) [12,14], and extracted differential immune genes and differential transcription factors from the DEGs, respectively draw their heatmaps and volcano. Then, the R software "clusterProfiler, org.Hs.eg.db, plot, ggplot2" was used to perform GO (Gene Ontology) functional enrichment analysis and KEGG (Kyoto Encyclopedia of Genes and Genome) pathway analysis for differential immune genes, with P-value<0.05 and q-value<0.05 as cut-off values.

Construction of a regulatory network between TFs and prognostic-associated immune genes
The differential immune genes were combined with survival time, and the prognosis-related immune genes were analyzed by cox univariate analysis, and the forest map was drawn with significance filtering standard P-value < 0.01. Then, correlation analysis was performed with differential TFs, with | cor |> 0.4 and P-value < 0.001 as the filtering criteria, and Cytoscape 3.7.1 version was used to visualize the regulatory network of TFs.

Construct a prognostic model
The prognostic prediction model was constructed by multivariate Cox regression model and the most prognostic gene signature were extracted by prognosis-related immune genes, and the "survivalROC" package was used to plot the ROC curve to assess the sensitivity and specificity of the prognostic model prediction.
The risk score for each patient was constructed as follows: (see formula in the Supplementary files) "n" is the number of prognostic genes, "exp i " is the expression value of the gene i, and "coef i " is the gene i coefficient in multivariate Cox regression analysis.
The patients were divided into high and low-risk groups according to the median risk value. The "survminer" and "survival" packages analyzed and visualized whether there was a difference in survival between the two groups, and used draws a risk curve with "pheatmap" R package. Finally, combined with clinical data for independent prognostic analysis, including single-factor independent prognostic analysis and multi-factor independent prognostic analysis, which used to assess the prognostic value of immune-gene signature and clinical parameters, and to study whether the predictive power of immune-genes signature is independent of other clinical parameters.

Correlation analysis of clinical parameters
To assess the association between immune genes in the prognostic model and clinical parameters, patients were divided into two subgroups (e.g. patients were divided into ≤50 subgroups and >50 subgroups by age) using univariate Cox regression analysis, and the clinically relevant immune genes were screened and mapped (P-value<0.05) by "beeswarm" R package.

Correlation analysis between immune cell content and immune gene expression
The abundance of six immune infiltrates (B cells, CD4=+ T cells, CD8+ T cells, Neutrophils, Macrophages, and Dendritic cells) were obtained from the TIMER official website (https://cistrome.shinyaoos.io/timer/) to analyze whether the risk score of UCEC was related to the above six immune cells [15].

Statistical Analysis
Survival analysis was performed using the Kaplan-Meier curve, the log-rank test was used to determine the statistical difference. The area under the ROC curve was used to analyze the prediction accuracy of the prognosis gene. The effects of clinical traits on OS were assessed by univariate Cox regression analysis and multivariate Cox regression analysis. HR and 95% CI were generated using the Cox proportional hazards model. Univariate Cox regression analysis was used to evaluate the correlation of immune cells. Significance with defined as P-value < 0.05, and cor-value > 0 was positively correlated, and vice versa.

Identification of DEGs in UCEC
In this study, the TCGA database of UCEC RNA-seq gene expression profiles was analyzed by "limma" package, and DGEs screening was carried out according to corrected P-value < 0.05 and |logFC| > 1.
Finally, we identified 6,268 DEGs,410 candidate prognostic immune genes and 100 differential TFs. In Fig.3, the distribution of three groups of differential genes was shown respectively.

Fig.2 and
We performed an enrichment analysis of differential immune genes to elucidate the biological function of UCEC. As shown in
To assess the correlation between 21 prognosis-related immune genes and TFs, univariate Cox regression analysis was performed with | cor |>0.4 and P-value<0.001 as filter criteria to construct a TFs regulatory network. According to the regulatory network diagram in

The prognostic model of immune gene signature
To establish a prognostic prediction model that can predict the prognosis of patients with UCEC, we used Cox regression analysis to identify ten-gene signature for UCEC prognosis predictions (PDIA3, LTA, PSMC4, TNF, SBDS, HDGF, HTR3E, NR3C1, PGR, CBLC) ( Table 1). In the prognostic model, the risk score for each patient was calculated.
All included patients were divided into a high-risk group (n=271) and a low-risk group (n=271) according to the median risk score (0.8756) and used as the cut-off point. The results showed that the overall survival of patients in the high-risk group was shorter than that in the low-risk group (P-value = 5.502e−09), and the 5-year OS rate of the high-risk and low-risk groups were 63.1% and 89.9%, respectively. Besides, the 9-year OS rate for the high-risk group is 34.6% and 78.7% for the low-risk group (Fig.6A). As shown in Fig.6B, the AUC was 0.756, confirming that our prognostic model was generally accurate and effective. Sorted according to each patient risk score, the patient's risk score, patient survival status, and gene expression were shown in Fig.6C. Therefore, these results indicated the accuracy of prognostic prediction-values of these ten-gene signature.
To assess the prognostic model risk score as an independent prognostic factor for patient survival, univariate Cox regression analysis and multivariate Cox regression analysis were used. The results showed that the P-values were all less than 0.05, so the risk score derived from this prognostic model can be independent of other clinical traits as an independent prognostic factor. The univariate Cox regression analysis showed that age (P-value=0.002, hazard ratio=1.035) and grade (P-value<0.001, hazard ratio=2.595) were significantly associated with prognosis, and the prognosis of patients was worse with the increase of age and grade. (Fig.7)

Comprehensive Analysis of clinical parameters and immune cell
Univariate Cox regression analysis was used to assess the correlation between immune genes involved in the prognostic model and clinical traits. Patients were divided into two groups based on clinical traits: Group 1 (Age <= 50 and Age > 50) and Group 2 (G1 & G2 and G3). The results showed that HDGF, PGR, PSMC4, and TNF were significantly associated with age, and the expression of HDGF, PSMC4, and TNF increased with age. HDGF, NR3C1, PDIA3, PGR, PSMC4, SBDS, and TNF were 8 significantly associated with the grade, and with the increase of grade, the expression levels of HDGF, NR3C1, PSMC4, SBDS, and TNF also increased (Fig. 8).
Finally, the Cox regression analysis was used to analyze the correlation between the risk score of UCEC patients and the abundance of six immune infiltrations. The results showed B cells (p=3.408e −10, cor=0.265) and Neutrophils (p=0.011, Cor = 0.109) was significantly correlated with the patient's risk score, and were positively correlated (Fig.9).

Discussion
Numerous reports have suggested that differential genes may be associated with various aspects of tumors, including tumorigenesis and prognosis. [16][17][18] The vast majority of genes that predict tumor prognosis are limited by certain factors, such as insufficient samples. In this study, a large sample size of TCGA genome-wide expression data was used to develop a 10-immune-gene prognostic signature for UCEC patients. This prognostic model may provide potential biomarkers for prognosis, response to immunotherapy.
To better understand the mechanism of occurrence and development of UCEC, GO function annotation and KEGG pathway analysis was performed on differential immune genes. The results showed that these genes were enriched in the immune cell response of the extracellular matrix. More and more studies have found that TME affects tumorigenesis, proliferation, angiogenesis, and apoptosis [19]. Menstrual cycle changes can be regarded as a chronic periodic inflammatory process, which can cause various immune cells to be recruited, such as neutrophils, mast cells, and phagocytes [7]. Previous studies have found that TAM (tumor-associated macrophages) play an important role in the development of UCEC, which can promote tumor cell proliferation and differentiation, matrix remodeling, angiogenesis and recruitment of mononuclear macrophages.
Moreover, the increase in the number of cells indicates a higher degree of malignancy and a poor prognosis [20]. Other inflammatory cells, such as neutrophils and mast cells, play an important role in tumorigenesis, although they account for different proportions. Further KEGG pathway analysis of UCEC may be associated with PI3K-Akt, MAPK, Ras, and JAK-STAT signaling pathways. AKT is a direct substrate downstream of the PI3K-AKT signaling pathway. After phosphorylation of AKT, it can 9 increase the motility of tumor cells, participate in cell proliferation and inhibit apoptosis. Studies have shown that the PI3K-AKT signaling pathway is activated in UCEC, which also can promote the development of tumors [21]. A large number of studies have shown that a variety of factors can activate MAPK, Ras, and JAK-STAT signaling pathways to mediate the proliferation, infiltration and other biological behaviors to promote the occurrence and progression of UCEC [22][23][24].
In this study, the TFs-related regulatory network showed that BIRC5 was positively regulated by multiple TFs, and BIRC5 was a high-risk gene. BIRC5, also known as survivin, is a member of the family of apoptosis-inhibiting genes and has a remarkable effect in inhibiting apoptosis, mainly by directly inhibiting the activity of caspase to inhibit apoptosis [25]. Many studies have shown that upregulation of BIRC5 leads to the development and progression of many malignant tumors in humans [26]. A western blot analysis showed that BIRC5 was overexpressed in more than 90% of UCEC [27]. Another study found that BIRC5 was overexpressed more frequently in recurrent UCEC than in primary tumors [28]. Some studies have shown that BIRC5 and CENPA are up-regulated in triple-negative breast cancer and are associated with poor prognosis of breast cancer [29]. It is noteworthy that activated E2F1/ FOXM1/ LMNB1 can up-regulate the expression of BIRC5 to inhibit apoptosis [30][31][32].
In this study, we developed a ten-gene prognosis signature (PDIA3, LTA, PSMC4, TNF, SBDS, HDGF, HTR3E, NR3C1, PGR, CBLC) for overall survival prediction of UCEC. In addition to HDGF (hepatomaderived growth factor) and PDIA3 (Protein disulfide-isomerase A3), the remaining 8 genes have not been well validated in gynecologic oncology, especially the UCEC. HDGF is a heparin-binding growth factor that is closely related to the differentiation, growth, and division of various tissues. It was involved in the occurrence and development of malignant tumors, can promote the proliferation and differentiation of tumor cells, and enhance the metastatic ability of tumor cells through EMT [33,34].
Studies found that HDGF is an independent risk factor for the prognosis of liver cancer, gastric cancer, cholangiocarcinoma and non-small cell lung cancer [35]. HDGF also has abnormalities in endometrial cancer. It is found that the higher the FIGO stage, the higher the expression rate of HDGF, which is a potential adverse factor for the progress and prognosis of UCEC [36]. PDIA3 also known as ERp57/GRP58, is involved in malignant transformation of cells through STAT3 and Wnt pathways in recent years, which is closely related to the occurrence and development of various tumors [37].
Interestingly, PDIA3 can enhance the proliferation and invasion ability of cervical and ovarian cancer cells in the field of gynecologic oncology and can be used as a sensitive marker to reflect tumor prognosis [38,39]. In our study, these two immune genes were also found to be important DEGs (Pvalue <0.0001), suggesting that they may play an important role in the development and progression of UCEC. It is worth noting that the overall survival rate of the high-risk group was lower than that of the low-risk group, with 63.1% and 89.9% overall five-year survival rates in the high-risk and low-risk groups, respectively. Moreover, the AUC analysis showed that the combination of these ten immune genes has prognostic value for UCEC patients. Independent prognostic analysis further demonstrated that the survival rate predicted by this prognostic model was not related to other clinical traits and could be used as an independent prognostic factor. Therefore, it may help to improve prognosis and provide effective treatment.
Clinical prognostic factors for UCEC have been demonstrated to include age, stage, and body weight [40]. Our study also confirmed that age and grade were associated with OS of UCEC and were high-risk factors. Clinical prognostic factors for endometrial cancer have been demonstrated to include age, stage, and body weight. Our study also confirmed that age and grade are associated with prognosis of endometrial cancer and are a high-risk factor. Further clinical correlation analysis showed that HDGF and PSMC4 were significantly associated with age and grade, and were positively correlated, which may be related to the fact that the up-regulation of these genes could promote the development of tumor [41,42]. In terms of survival prediction, the current staging system is far from accurate at the individual level. And age is not a survival indicator for cancer, because older people are less likely to receive adjuvant therapy [43]. Therefore, risk scores may be used as a more reliable prognostic factor for UCEC compared to age and stage.
At present, more and more scholars believed that there were a large number of immune cells and related inflammatory factors in the UCEC interstitial, which was an important part of the tumor inflammatory microenvironment and had obvious influence on the biological behavior of UCEC [44]. Therefore, based on TIMER immune cell infiltration abundance, we analyzed the association between UCEC risk-score and immune cells. We found that B cells and neutrophils were significantly associated with the patient's risk-score and were positively correlated. Wikberg et al. showed that neutrophils of the innate immune system play a major role in acute inflammation and also play a role in anti-tumor immune responses [45]. Although neutrophil infiltration was closely related to other immune cell infiltration, studies have found that neutrophil infiltration may have additional prognostic value in various cancers. In the case of chronic inflammation, neutrophils persist in tissues, and this persistence was associated with cancer progression. It has been found that elevated numbers of neutrophils in many human cancers or a higher Neutrophil/Lymphocyte Ratio (NLR) were associated with poor prognosis, which may be related to neutrophils secreting matrix metalloproteinase-9 to stimulate the angiogenic activity of cancer cells [46,47]. Different proportions of infiltrating B cells were included in human solid tumors. Although the search for immune-related factors associated with a cancer diagnosis, prognosis, and survival has been largely limited to T cell responses, the latest reports suggested that B cells may also be critical for the prognosis of cancer. The data reported by the Schimdit team confirmed that the B cell marker was the strongest prognostic factor in breast cancer and other human tumors, given the immunoglobulin kappa chain (IGKC) secreted by plasma cells [48,49]. Nielsen et al. found that patients with higher levels of ovarian cancer had higher survival rates when CD20+ B cells were increased [50]. Therefore, as the risk-score increases, the levels of these two immune cells may increase and participate in immune escape or suppression together.
These findings have important clinical value for UCEC, but there were still several limitations in our study. First, the TCGA database of UCEC only contained clinical traits of age and grade and lacked clinical data related to stage and TMN. Second, most of the ten-gene prognosis signature and immune cells were rarely reported in UCEC, and more prospective studies were needed to further validate the intrinsic relevance of these genes to the prognosis of patients with UCEC.

Conclusions
In summary, based on TCGA, ImmPort, and Cistrome databases, 410 candidate prognostic immune genes were screened and a TFs regulatory network was constructed. These differential immune genes were involved in the development of UCEC through immune cells, and tumors are closely related to PI3K-Akt, MAPK, Ras, and JAK-STAT signaling pathways. A prognostic model was established to identify ten-gene prognostic signature (PDIA3, LTA, PSMC4, TNF, SBDS, HDGF, HTR3E, NR3C1, PGR, CBLC), and studies have shown that risk-score can be used as a strong prognostic factor for UCEC. In terms of immune cells, B cells and neutrophils were significantly associated with patient risk-score. In future studies, large-scale prospective studies should be used to evaluate our findings. Table   Due to technical limitations the Table is available as a download in the Supplementary Files. Table 1 The prognostic model of prognosis-related 10 immune genes      Clinical correlation analysis of the ten-gene prognosis signature with age and grade.

Declarations
26 Figure 9 Association between the risk score of the ten-gene prognosis signature and the abundance of 6 immune infiltrates.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Table.tif