Tumor mutation burden in connection with immune-related survival in uterine corpus endometrial carcinoma

Background UCEC is the most common gynecological malignancy in many countries, and its mechanism of occurrence and development is related to tumor mutation burden (TMB) and immune cell infiltration. Therefore, it is necessary to systematically explore the TMB-related gene profile in immune cells to improve the prognosis of UCEC. Methods We integrated TMB-related genes with basic clinical information of UCEC patients based on TCGA dataset. Differentially expressed genes (DEGs) were selected through differential expression screening, PPI, and enrichment analysis. Additionally, we analyzed the components of immune cell infiltration of the DEGs to obtain the differential immunity-related genes. A single factor and multifactor Cox regression analyses were conducted to establish new prognostic indicators of OS and DFS based on TMB-related immune genes. To further study the correlation between survival and immune cell infiltration, a Cox model based on these immune infiltration compositions was built. Using the clinical variables, we established nomograms for OS and DFS. Results 393 DEGs were significantly associated with clinical outcomes and the immune component in patients with UCEC. Gene Ontology (GO) and Kyoto Encyclopedia of Genes, Genomes (KEGG) pathway and protein-protein interaction network (PPI) analyses revealed the role of these genes and information on related pathways. Then, two prognostic models were established based on the differential immune genes for OS (GFAP and MX2) and DFS (MX2, GFAP, IGHM, FGF20, and TRAV21). In DFS, the differential immune genes were related to CD4+ T cell, CD8+ T cell, macrophage, and neutrophil (all P < 0.05). B cell and CD8+ T cell were independent prognostic factors from among the immune cell elements in UCEC. Finally, the risk scores of these models were combined with the clinical elements-based nomogram models, and the AUC values were all over 0.7. Conclusions Our results identified several clinically significant differential immune genes and established relevant prognostic models, providing a basis for the molecular analysis of TMB and immune cells in UCEC, and identified potential prognostic and immune-related genes for UCEC. We added clinical related conditions for further analysis to confirm the identity of the genes and clinical elements-based models.


Introduction
survival rate of stage I UCEC exceeds 90%, while stage IV can only reach 20% [2]. Traditional treatment methods mainly include surgery, radiotherapy, chemotherapy, and hormone therapy. For patients with metastases, surgery, and radiotherapy cannot achieve a satisfactory treatment level. These patients are treated using chemotherapy or hormone therapy, which also causes more significant damage to normal human cells and results in recurrence. Most importantly, options are often limited after traditional first-line treatment for advanced patients with metastases or recurrence, which has prompted many scholars to develop new treatment methods.
In recent years, immunotherapy has become an effective method for treating advanced cancer [3]. For example, typical clinical trials include lung cancer [4,5] and melanoma [5][6][7]. Among these immunotherapies, the most important one is Immune Checkpoint Inhibitor (ICI), and its main inhibitory points include Cytotoxic T-lymphocyte antigen 4 (CTLA-4) and Programed cell death 1 receptor (PD-1) [7,8]. More specifically, ICI blocks the checkpoint activity of T cells, which increases the sensitivity of immune cells to the recognition of cancer cells and triggers the immune system to attack and destroy cancer cells. Tumor mutation burden (TMB) refers to the total number of mutations per mega base in tumor tissue. There is increasing evidence that the burden of tumor mutations is related to immunotherapy in cancer patients [9]. Although most inhibitors are rarely clinically approved for humans, their immunotherapy has shown great potential in refractory tumors. Immunotherapy research has also been conducted on targeted therapy for UCEC. The level of TMB and microsatellite instability can predict whether patients with UCEC may benefit from immunotherapy [10,11]. Uterine carcinosarcoma is a rare and aggressive histological variant of UCEC with a poor prognosis [10]. Bhangoo et al. [11] found that inactivation of DNA polymerase ɛ (POLE) mutation is associated with a high TMB and ICI in carcinosarcoma. In summary, the study of TMB on immune invasiveness of UCEC requires further research. Therefore, we aimed to study the relationship between the clinical prognosis of UCEC and the role of TMB and immunity.

Data collection
Clinical data on a UCEC cohort, including age, stage, grade, histological type, together with transcriptome data, were obtained from the TCGA data portal (https ://porta l.gdc.cance r.gov/). Meanwhile, we downloaded the processed "mask somatic mutation" data using the VarScan algorithm in the TCGA database. The R package "maftools function" [12] was adopted to describe mutated genes in the UCEC samples. Since each dataset was retrieved from published reports, it was verified that written informed consent had been obtained.

TMB score and clinical characteristics
We used Perl script (JAVA8 platform) to remove silent mutations from the samples while calculating the number of base mutations and then corrected it to the number of base mutations per million bases, which was the TMB value. The samples were divided into a low group and high group based on the median value of TMB. Further, we used Kaplan-Meier statistics to analyze survival differences between these two groups. The Wilcoxon ranksum test was used to evaluate the correlation between TMB and various clinical variables. Furthermore, X-tile software (Yale University, New Haven, Connecticut, USA) was utilized to select the optimal threshold value for age. We analyzed overall survival (OS) and diseasefree survival (DFS) to determine the prognostic value of the two TMB groups of patients with UCEC.

Differentially expressed genes and enrichment analysis
The Limma package of R software was adopted to predict differentially expressed genes (DEGs) between the high-TMB group and low-TMB group. Genes that exhibited a |Fold Change|of > 1 with an adjusted P-value of < 0.05 were regarded as significant DEGs. Then, the "pheatmap" package of R software was used to assess the effect of the difference by constructing a heatmap plot. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses were carried out to identify the primary biological attributes for the genes selected above using the "enrichplot, " "clusterProfiler, " "ggplot2" and "org.Hs.eg.db" functions of R software. Biological processes, cellular components, and molecular functions are the three primary functional components of GO enrichment analysis. KEGG analysis can identify the neighbor pathways of the DEGs.

Search tool for the retrieval of interacting genes database (STRING)
STRING (https ://strin g-db.org/) is a database of known and predicted protein-protein interactions, and its latest version, 11.0, includes 24,584,628 proteins from 5090 organisms [13]. This tool was applied to explore the physical and functional associations between the DEGs and conduct the protein-protein interaction network (PPI) analysis of the DEGs. The results of the PPI analysis provided the number of adjacent nodes per gene, which was obtained through a histogram created using R software.

Gain differential immune genes and survival analysis
CIBERSOR is a general calculation method used to quantify cellular components from many tissue gene expression profiles (GEP), which could accurately estimate the immune composition of tumor biopsy [14]. We applied the immune cell expression profiles to the "CIB-ERSORT" algorithm to determine immune cell content in the UECE sample. The vioplot package was used to determine the expression differences of various immune cells in the high-TMB and low-TMB groups. After intersecting the immune cell-related genes derived from the immune database (https ://www.immpo rt.org/) and the DEGs, the differential immune genes were obtained using the "VennDiagram" package. Kaplan-Meier survival analysis was performed to assess the survival of the immune-related genes.

Establishment of differential immune genes-based prognostic index models
First, differential immune genes were extracted to be further study using a Cox single-factor and multifactor regression analysis for OS and DFS, which was performed using R software. The statistical indicators that were used were: hazard ratio (HR), 95% confidence interval (CIs), and log-rank P-value (a P-value of < 0.05 was considered statistically significant). The receiver operating characteristic (ROC) curve of each model was constructed. Furthermore, patients were divided into high-risk and low-risk groups based on the median risk score.

TIMER database
TIMER web server (https ://cistr ome.shiny apps.io/timer /) is a comprehensive resource for the systematical analysis of six immune infiltrates, including B cells, CD4+ T cells, CD8+ T cells, Neutrophils, Macrophages, and Dendritic cells, across diverse cancer types [15]. Moreover, the survival module in the TIMER database was also used to draw Kaplan-Meier plots for the immune infiltrates and hub immune genes to visualize survival differences. The P-value of the log-rank test used to compare the survival curves of the two groups is shown in each plot.

Outputs of the immune elements-based models
Surv (UCEC) ~ B_cell + CD8_Tcell + CD4_Tcell + Macrophage + Neutrophil + Dendritic was the formula of Cox's regression model. This model was fitted using the "coxph" function of the R software package, "survival. " HR is presented as lower and upper 95% confidential intervals, and the threshold p-value was set at 0.05.

Construction of nomogram models for OS and DFS
The information on clinical characteristics obtained from public datasets and two risk groups were collected and divided into a modeling group and a verification group based on a 7:3 ratio. The "rms" package of R software was used to construct the nomogram and calibration plots.
ROC curves were also plotted to determine the sensitivity and specificity of the nomogram models.

Statistical analysis
R 3.6.3 (https ://www.r-proje ct.org/) software was used for all statistical analyses. Additionally, the R package "survival" was utilized to conduct the univariate Cox regression analysis to examine the relationship between OS and DFS with gene expression. At the same time, a model was constructed based on the multivariate Cox regression analysis. X-tile software was used to group patient data based on age, with 69 years of age for OS and 59 years of age for DFS used the critical values (Figs. 3,4). Then, nomogram plots were conducted for both clinical parameters and risk groups. The area under the curve (AUC) of the ROC curve was computed using the ROC function of the "survival" package in R software. A P value of < 0.05 was considered to indicate statistical significance.

Analyze mutation data in UCEC
The somatic mutation data of 375 UCEC patients were downloaded from the TCGA database, and these materials were visualized using the "maftools" R package. Diagrams a-c in Fig. 1 provides a summary of the types of mutations in the samples, in which missense mutations occupied the leading position (Fig. 1a), C>T was the most common single nucleotide variant (SNV) (Fig. 1c). Only one mutation type, Single Nucleotide Polymorphism (SNP), was found in UCEC (Fig. 1b). The mutation value and median of each sample are shown in Fig. 1d, e. Furthermore, we showed the top 10 mutated genes in the samples, which included TTN (35%), MUC16 (22%), PTEN (46%), RYR2 (20%), CSMD3 (20%), PIK3CA (46%), TP53 (31%), ARID1A (23%), CHD4 (19%) and CTNNB1 (22%) (Fig. 1f ). The waterfall chart shown in Fig. 2a demonstrates the specific mutation types and proportions of the mutant genes in each sample. Figure 2b shows the connection between two mutant genes, in which the green color indicates a positive correlation, while a brown color indicates a negative correlation.

Relationship between TMB and clinical features
The TMB of UCEC cases was calculated to determine its clinical significance. For OS, initially, we found that 69 years of age was the best age cut-off value using x-tile software, and the median value of TMB was used as the optimal cut-off value to divide these samples into two groups: low and high TMB groups. It could be detected that the low TMB group showed a more unsatisfactory survival outcome compared to the high TMB group, based on values in the log-rank test and Kaplan-Meier curve (P = 0.025, Fig. 3c). The high TMB group was closely related to the grade 3 or 4 (P = 0.011, Fig. 3e) endometroid pathological type (P < 0.001, Fig. 3f ) and stage I or II (P = 0.029, Fig. 3g), while there was no significant difference between TMB and age (P = 0.62, Fig. 3d). For DFS, Fig. 4 shows that 59 years of age was the optimal threshold and that the association between TMB and survival were similar to the results for OS. Additionally, expression levels of TMB decreased due when age was ≤ 59, the grade was 3-4, and endometrioid histological type (all P < 0.05), and there were no significant associations between TMB and stage (P = 0.11).

Enrichment analyses and the PPI of genes in the different TMB group
We found 393 differentially expressed genes (DEGs) between the high TMB and low TMB groups, and the top 19 DEGs are shown in Table 1. The heatmap constructed shows the expression levels of the DEGs (Fig. 5a). The Venn diagram was created to show 1713 immune-related genes and 393 DEGs to obtain 98 differential immune genes (Fig. 5b). After that, the KEGG pathway and GO term enrichment analyses of these DEGs were conducted. Figure 6a and Table 2 show that the GO terms primarily enriched for biological processes included humoral immune response and lymphocyte-mediated immunity. Simultaneously, the Immunoglobulin complex, external side of the plasma membrane, and immunoglobulin complex were the cellular components that were mainly enriched.
Regarding molecular functions, most genes were enriched for antigen binding, immunoglobulin receptor binding, and receptor-ligand activity. In the KEGG pathway enrichment analysis of the DEGs, these genes showed notable associations with breast cancer, basal  Table 3). String was utilized to confirm the relationship between DEGs and then construct networks describing their interactions (Fig. 7a). The most common neighbors in the PPI diagram were APOA1 and APOB (Fig. 7b).

Association between differential immune gene expressions and immune infiltration
The violin plot shows the difference between various immune cell types in the high and low TMB groups. As suggested by the results, T cells CD8 (P < 0.001), T cells CD4 memory resting (P = 0.006), T cells CD4 memory activated (P < 0.001), T cells follicular helper (P < 0.001), T cells regulatory (P = 0.031), NK cells resting (P = 0.042), Macrophages M0 (P = 0.002), Macrophages M1 (P < 0.001), Macrophages M2 (P = 0.018) and Dendritic cells activated (P = 0.014) showed significant differences between the high and low TMB groups (Fig. 8). Differential immune genes showed significant associations between their expression patterns, and the OS and DFS for UCEC patients were used to conduct a univariate Cox regression analysis. Eventually, we found GFAP (HR 1.023, 95%CI 1.011,1.036, P < 0.001) and MX2 (HR 1.132, 95%CI 1.062,1.207, P < 0.001) to be related to overall survival after multivariate Cox regression analysis was used to remove genes that were not independent indicators of prognosis (Fig. 9a). It could be detected that high expression of GFAP and MX2 were related to poorer OS in endometrial cancer patients, based on the values of the long-rank test and Kaplan-Meier curve (Fig. 9b, c). MX2 (HR 1.130, 95%CI 1.058, 1.207, P < 0.001), GFAP (HR 1.025, 95%CI 1.012, 1.038, P < 0.001), IGHM (HR 0.999, 95%CI 0.998, 1.000, P = 0.039), FGF20 (HR 0.939, 95%CI 0.883, 0.998, P = 0.041) and TRAV21 (HR 0.611, 95%CI 0.407, 0.917, P = 0.017) were analyzed for DFS (Fig. 10a). Regarding DFS, UECE patients with a higher expression level of MX2 and GFAP had a worse prognosis, while patients with higher mRNA levels of IGHM, FGF20, and TRAV21 showed a favorable outcome ( Fig. 8c-h). These genes and clinical factors, including age, grade, stage, histological type, which are related to prognosis, were selected for model construction. The prognosis index formula multiplied the gene expression level and clinical characteristics in each case with the corresponding Cox regression coefficient. All selected genes were classified into either a high-risk or low-risk group based on the median prognosis index value, and survival curves were plotted. Moreover, Kaplan-Meier plots were drawn to analyze differences in survival time between both groups, indicating that cases in the low-risk group were significantly associated with increased survival probability than those in the high-risk group, for both OS and DFS (Figs. 9d and 10h). As shown in Figs. 8e and 9b, the

Association between risk score and immune infiltrates
The relationship between risk score and different types of immune cells were obtained from the Timer website using R software. The risk score of DFS showed a negative correlation with CD4+ T cell, CD8+ T cell, macrophage, and neutrophil (all P < 0.05, Fig. 11a). However, the risk score of OS was not associated with immune cells (all P < 0.05, Fig. 11b).

High B cell and CD8 + T cell infiltration indicates a better outcome
To understand the association between immune infiltrating cells and survival in UCEC, we used the Cox regression equation to calculate the expression levels of CD4+ T cell (HR 0.001, 95%CI 0, 0.052, P = 0.005), CD8+ T cell (HR 0.001, 95%CI 0, 0.052, P = 0.001) and Neutrophil (HR 2314.933, 95%CI 1.836, 2919028.574, P = 0.033), which had declined due to cancer progression (Table 4). Additionally, Kaplan-Meier plots were applied to determine that UECE patients with a higher expression level of B cell and CD8+ T cell had a better prognosis (Fig. 12).

Nomogram of the differential immune genes and clinical variables for OS and DFS
The basic clinical characteristics of the patients are shown in Table 5 for OS and in Table 6 for DFS. Through univariate and multivariate Cox analyses of the modeling group and the entire cohort of patients, we found that the risk score (especially for DFS) may be an independent risk factor for UCEC patients (Tables 7, 8 , 9, 10). Additionally, we used risk group, age, grade, stage, and histological type to construct nomogram models for OS (Fig. 13a) and DFS (Fig. 14a). To verify the predictive value of the models, 3-year and 5-year calibration charts were drawn for the modeling group and the verification group for OS (Fig. 13b-e) and DFS (Fig. 14b-e), which suggested the two models produced consistent results. ROC curves were constructed, and their AUC areas were greater than 0.7, indicating that these two models showed a high level of diagnostic accuracy.
In contrast, group testing found that patients with high TMB showed an excellent response to immunotherapy [23], indicating that a higher TMB is expected to improve the efficacy of immune checkpoint inhibitors in cancers. Then, we used the CIBERSORT algorithm to calculate the proportion of immune cell infiltration in each patient sample. Additionally, the two TMB groups were strongly associated with specific immune infiltrating cells, which further proved that TMB was associated with the immune response. At the same time, high TMB was related to immunotherapy, and this phenomenon may involve immune infiltrating cells.
Using the median value of TMB as the critical value, we identified 393 DEGs between the low and high expression groups of TMB. Functional enrichment analyses of those DEGs provided an understanding of their biological roles. In the GO analysis, immune cells and receptors were found to be associated with the DEGs. Sonoda et al. [24] suggested that RCAS1 was a ligand for immune cell receptors, which was significantly associated with the stage of UCEC, the degree of myometrial invasion, positive peritoneal cytology, and overall survival. In the KEGG analysis, they were associated with various tumor pathways or genes, such as basal cell, breast, and gastric carcinoma. This implied that they had specific tumor pathways in common. For example, there were some similarities between UCEC and breast cancer at the molecular level. IDO1 was involved in the anti-tumor immune process of both tumors and was related to TMB [18]. After the protein network map of the DEGs was constructed, we detected the core expression levels of these genes in the PPI network, which may play an essential role in UCEC. APOA1 could promote the increase of macrophage infiltration, decrease TMB and metastasis, and improve the survival rate, similar to its effects in colorectal cancer [25,26]. APOB caused a high mutation burden of cancer genes and tumorigenesis [27].
We utilized differential immune genes to establish Cox risk models for OS and DFS. Two genes (GFAP and MX2) were identified in the OS model, while five genes (GFAP, MX2, FGF20, IGHM, and TRAV21) were identified in the DFS model. In this study, we further explored the role of the above-mentioned differential immune genes and immune infiltration in the high and low TMB groups in UCEC. These genes were the core differential immune genes, and the risk scores obtained from the Cox multivariable analysis were grouped. For DFS, the risk score was related to some immune cells.
The higher the risk score for DFS, the lower the level of CD4+ T cell, CD8+ T cell, macrophage, and neutrophil infiltration. Additionally, through the survival analysis, we found that high-risk scores indicated better survival. MX2 is a viral interferon, which is the key to block human immunodeficiency virus type 1 [28]. GFAP can delay the development of type 1 diabetes by regulating T cell differentiation [29]. Inhibition of the FGFR family of genes can prevent the development of tumors by blocking paracrine signaling, which was related to immune escape in the tumor microenvironment [30].
We found that decreases in the expression of B cell and CD8+ T cell showed an apparent association with a poorer prognosis. In breast cancer, the presence of CD8+ T cells decreased the risk of breast cancer death by about 20% [31]. CD8+ T cells induced prolonged survival for patients with various types of tumors, including liver cancer [32] and rectal cancer [33]. Previous research [34] has proven that CD8+ T lymphocytes are an independent risk factor and that UCEC patients with high expression levels show better survival rates, which is consistent with our results. The role of B cells on tumors is unclear. B cells promote squamous cell carcinoma (SCC) development by depositing immunoglobulin-containing immune complexes [35]. In our study, high B cell expression was associated with a good outcome. Although the role of B cells in tumors is uncertain, the prognostic value of B-cell gene expression signatures in tumors has been demonstrated [36]. From the above the mechanisms that B cell and CD8+ T cell were linked to immunosuppression in tumor microenvironment are incompletely clear. Advanced research implied that these immunotherapy are potentially related to the expression of PD1 and CTLA4 from CD8+ T cell.
Finally, the risk stratification in the models mentioned above and other clinical factors was used to conduct single-factor and multifactor Cox analyses to construct the corresponding Nomogram model. The ROC curve showed that the Nomogram model was more reliable than the other models. The nomogram could comprehensively evaluate the survival rate of patients with genetic and clinical factors, and they were more intuitive and performed well. Furthermore, the risk score in DFS showed a significantly greater impact on diagnosis than in OS. At present, an increasing number of genes The violin diagram depicts the expression analysis of multiple immune cells in the high and low TMB groups. The green color is used for the high TMB group, and the red color is used for the low TMB group have been used as models to predict the prognosis and improve the prognosis of UCEC [37].
However, this study also contains certain limitations: (i) One limitation of this study is that it was a retrospective study. The relationship between differential TMB-related immune genes and immune cell infiltration still needs to be confirmed using primary experimental evidence. For example, we can analyse the differential immune cell infiltration expression between TMB High and TMB Low goup of EC cells and further explore the expressed level of immune checkpoint inhibitors such as PD-1 or CTL-4. (ii) The lack of many clinical samples to verify the prognostic effect of TMB and its potential relationship with immune infiltration. Therefore, the relationship between the occurrence and development of EC needs to be further confirmed using many more studies. Due to cost and technological limitations, the application of polygenic models is restricted.

Conclusions
High TMB is related to prolonged survival and may promote immune infiltration. The immune gene models related to TMB levels were established. Clinical factors related to the models were determined to evaluate the prognosis of UCEC further and provide a