Systematic profiling of alternative splicing in Helicobacter pylori-negative gastric cancer and their clinical significance

Background Alternative splicing (AS) may cause structural and functional variations in the protein to promote the proliferation of tumor cells. However, there is no comprehensive analysis of the clinical significance of AS in Helicobacter pylori-negative gastric cancer (HP− GC). Methods The clinical, gene expression profile data and AS events of 138 HP− GC patients were obtained from the database named the cancer genome atlas. Differently expressed AS (DEAS) events were determined by a comparison of the PSI values between HP− GC samples and adjacent normal samples. Unsupervised clustering analysis, proportional regression and Kaplan–Meier analysis were used to explore the association between clinical data and immune features and to establish two nomograms about the prognosis of HP− GC. Finally, splicing networks were constructed using Cytoscape. Results A total of 48141 AS events and 1041 DEAS events were found in HP− GC. Various functions and pathways of DEAS events parent genes were enriched, such as cell-substrate junction, cell leading edge, focal adhension, and AMPK signaling. Seven overall survival (OS)-related and seven disease-free survival (DFS)-related AS events were used to construct the prognostic signatures. Based on the independent prognostic factors, two nomograms were established and showed excellent performance. Then, splicing regulatory networks among the correlations suggested that splicing factors were significantly associated with prognostic DEASs. Finally, the unsupervised clustering analysis revealed that DEAS-based clusters were associated with clinical characteristics, tumor microenvironment, tumor mutation burden, and immune features. Conclusion Seven OS-related and seven DFS-related AS events have been found to be correlated with the prognosis of HP− GC and can be used as prognostic factors to establish an effective nomogram.


Background
Gastric cancer (GC) is the fifth most common malignancy and the third leading cause of cancer-related death worldwide [1]. Among many carcinogens of GC, Helicobacter pylori (HP) infection is one of the most important factors [2]. And according to The proportion of HP-positive gastric cancer (HP+ GC) is significantly higher than that of HP-negative gastric cancer (HP − GC) patients, and over 90% of patients with GC have been infected with HP [3]. Several studies showed significant differences in histology and prognosis between HP+GC patients and HP − GC patients [4][5][6]. Marrelli et al [5] concluded that HP − GC had a more advanced stage and a more advanced pT classification than HP+ GC. And this result was consistent with the conclusion of another study, which illustrated that HP − GC patients are in stage IV more and have worse overall survival (OS) compared to HP+ GC patients [6]. Although the prognostic biomarkers of HP+ GC have been widely studied previously [7][8][9], the related research is few in HP − GC. Therefore, it is necessary for us to explore the prognostic factors of HP − GC patients.
In recent years, alternative splicing (AS) has received widespread attention for its abnormal forms may cause structural and functional variations in the protein, therefore it could benefit growth and survival for the tumors [10,11]. Many studies have clarified the role of AS events as prognostic factors in cancers, such as hepatocellular carcinoma [12] and colorectal cancer [13]. However, the effect of AS events on the prognosis of patients with HP − GC is still unknown. In addition, as the relationship between GC and immune mechanisms is revealed [14] and AS events can also affect tumor immunity [15], we are encouraged to further understand the correlation between AS events and immune features in HP − GC.
In this study, AS events data from the TCGA SpliceSeq data portal were used to identify the HP − GC -related AS events, and we comprehensively analyze the prognostic potential of AS events on HP − GC patients. Furthermore, we investigated the biological and immunological functions associated with these AS events to explore their relevant mechanisms.

Data acquisition and selection
The clinical data of HP − GC patients and the corresponding gene expression profile data were obtained from the cancer genome atlas (TCGA). And the inclusion criteria were as follows:(1) Histological diagnosis of HP − GC; (2) With complete data of RNA sequencing; (3) With complete clinical data, including gender, age, and AJCC TNM staging. In addition, patients with OS less than 30 days were excluded from the present study. Meanwhile, data of seven types AS events were collected from the TCGA SpliceSeq database [16], including alternate acceptor site (AA), alternate promoter (AP), alternate donor site (AD), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI). The percent spliced in (PSI) value, which ranges from 0 to 1, was used to quantify the AS events. To generate reliable AS events, the filter condition was set (percentage of samples with PSI values ≥ 75, an average of PSI values ≥ 0.05). Finally, the Upset plot generated by the UpSetR package was used to illustrate the interactive sets between seven types of AS events [17].

Identification of tumor-associated AS events and enrichment analysis
Directly comparing the biomarkers at different pathological state to screen hub biomarkers has been widely performed to determine tumor-associated biomarkers in past researches [13,15]. Therefore, to determine the differential expression AS (DEAS) events, a comparison of the PSI values was made between the HP − GC samples and the adjacent normal samples. AS events with adjusted P < 0.05 and |LogFC|>1.5 were defined as DEAS events. The parent genes of identified DEAS events were then incorporated into the enrichment analysis, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with metascape [18].

Construction of the prediction model based on the DEAS
To further understand the role of DEAS events in HP − GC patients, 138 patients with OS data were enrolled to study the OS-related DEAS events, and 116 patients with DFS data were selected to study the DFS-related DEAS events. Afterward, a univariate Cox proportional hazard model was performed to identify the prognostic DEAS events, including OS-related and DFS-related AS events. Then, the least absolute shrinkage and selection operator (LASSO) regression analysis was used to avoid overfitting [19]. After selecting the optimal OS-related and DFS-related AS events in the LASSO analysis, we further established two prognostic signatures based on the multivariate Cox proportional hazard model.
The risk scores of each patient were determined by the following formula: β was defined as the regression coefficient.
The optimal cut-off value of the risk score was identified by the X-tile software [20]. Then, all patients were stratified into the low-risk group, moderate-risk group, and high-risk group. In order to compare the OS and DFS between the three groups, Kaplan-Meier survival analysis with log-rank test was performed. In addition, CIBERSORT package was used to quantify the 22 immune cells in all tumor samples, and samples with CIBERSORT P < 0.05 were enrolled in the analysis between risk score and immune cells. The correlation between risk score and immune cells was determined by the Spearman correlation test.

Development of nomogram based on the DEAS events and clinicopathological data
To establish the nomogram for HP − GC patients, the prognostic predictors were determined by univariate Cox proportional hazard model. Then, the predictors with a P-value less than 0.1 in the univariate analysis were selected into the multivariate Cox analysis, and the independent prognostic predictors were determined. Afterwards, two nomograms of OS and DFS were established by the "rms" package and C-index was selected to show the discrimination [21]. Moreover, calibration curves were generated to show the calibration of the nomogram. Finally, to investigate the clinical value of nomograms, the decision curve analysis (DCA) was performed [22,23].

Construction of splicing related network
The splicing factors (SFs) data were downloaded from the SpliceAid2 database. After that, the expression of 71 SFs was obtained from the TCGA data portal. Then, the differential expression analysis between tumor samples and normal samples was performed to determine the tumorrelated SF. The Spearman test was performed to explore the relationship between tumor-related SFs and DEASs. It was considered as a significant association when r is more than 0.5 and p is less than 0.05. In addition, the interaction networks of DEASs and SFs were constructed using Cytoscape (3.7.2).

Evaluation of correlation between DEAS events and clinical data, tumor microenvironment, tumor mutation burden, and immune features
After DEASs were identified, the classification of HP − GC cohort was performed by the unsupervised consensus approach by the "Consensus Cluster Plus" package [24,25]. According to the results, the optimal number of clusters could be identified and patients were divided into several clusters. Meanwhile, the ESTIMATE algorithm was performed to quantify the tumor microenvironment, including immune score and stromal score. Then, Wilcoxon test analysis was performed to compare the 22 types of immune cells, immune score and stromal score between clusters. Besides, the associations between clusters and clinicopathological variables, including AJCC TNM staging, histologic grade, age, and gender were also analyzed.

Overview of AS events and identification of HP − GC -related AS events
Based on the criteria, 138 HP − GC patients were included in our research. The characteristics and clinical data of the 138 patients were shown in Table 1. 31804 AS events were identified in 138 patients, including 2791 AA-type AS events containing 2084 genes, 2394 AD-type AS events involving 1767 genes, 6281 AP-type AS events including 3602 genes, 5530 AT-type AS events involving 3157 genes, 12524 ES-type AS events containing 5506 genes, 146 ME-type AS events involving 141 genes, 2138 RI-type of variable splicing event containing 1478 genes (Fig. 1a). Then, an Upset plot was generated to show the interactive sets of seven types of AS events. There are more than one AS events in most of the parent genes, and some single gene could have up to five distinct splicing patterns (Fig. 1b). For example, the blue line in the figure includes two points (AD and ES) means that there are 267 genes with only AD and ES.
To identify the HP − GC-specific AS events, differences of AS events between 138 primary HP − GC tissue and seven adjacent normal tissues were compared to identify the DEASs profiling. Totally, 1041 DEASs were determined, which include 429 APs, 329 ESs, 139 ATs, 47 RIs, 47 ADs, 45 AAs and 5 MEs (Fig. 1c, d, Additional file 1: Table S1). Although most of the events in the HP − GC

Enrichment and interaction analysis of DEAS
AS events could affect the protein function by various mechanisms [26]. Therefore, we should further understand the function of DEAS events in HP − GC by studying the molecular mechanisms and pathways involved in the parent genes of DEAS events. The results of GO analysis were illustrated in Fig. 1e, which showed that specific GO categories were significantly related to HP − GC, like cell-substrate junction, cell leading edge, actin binding, and actin filament-based process. In addition, some KEGG pathways that related to HP − GC development were enriched (Fig. 1f ), including focal adhension, AMPK signaling, hypertrophic cardiomyopathy, and rap1 signaling pathway. In a word, the enrichment analysis suggested corresponding genes of DEASs play   Table S3). Then, the LASSO regression was used to select the significant OS-related and DFS-related DEAS events (Fig. 2a-d), and 11 OS-related DEAS events and 13 DFS-related DEAS events were determined (Additional file 4: Table S4 and Additional file 5: Table S5). After the LASSO analysis, the significant DEAS events identified were analyzed by the multivariate Cox proportional hazard model and  , and HP − GC patients were stratified into low-, middle − and high-risk groups through the X-tile software. The distribution of prognostic outcomes in these three risk stratifications was visualized in Fig. 3a, b. As shown in Fig. 3c, d, compared with the low − risk group, the high-risk group had a significantly higher incidence of deaths and shorter survival time of patients. In addition, the K-M survival curves and the log-rank test showed that both of the prediction models had excellent performance for predicting the prognosis of HP − GC patients in these three groups (Fig. 3e, f ).
The tumor immune mechanisms play essential roles in the progress of various cancers [27]. Thus, we evaluated the association between 22 types of immune cells and risk scores of OS and DFS to explore the correlation between immune cells and DEAS-based prognostic value in HP − GC (Fig. 4a, c). It is showed that plasma cells (r = 0.45) and monocytes (r = 0.23) are associated with the risk score of OS, while monocytes (r = 0.41) and mast cells activated (r = 0.30) are related with a risk score of DFS (Fig. 4b, d).

Development of AS-clinicopathologic nomogram
Nomogram is a tool used for the prediction of patients' prognosis. The Cox analysis of clinicopathologic characteristics showed that in addition to the risk score was one of the independent factors for OS and DFS in HP − GC cohort, age and N stage were OS-related and DFS-related factors, respectively (Tables 4, 5). Then, two nomograms were established for predicting the OS (Fig. 5a) and DFS (Fig. 5d) in HP − GC patients. And the plot of the ASclinicopathologic nomograms to predicting the possibility of survival at 1-, 2-, and 3-years showed a strong consistency between the nomogram-predicted outcome and actual outcome (Fig. 5b, e). The C-index for OS  nomogram was 0.762 (95% CI 0.729 to 0.795), which was higher than risk score (0.740, 95% CI 0.708 to 0.772) and age (0.552, 95% CI 0.519-0.585). And the C-index for DFS nomogram was 0.783 (95% CI 0.750 to 0.816), which was higher than risk score (0.767, 95% CI 0.738 to 0.796) and T stage (0.554, 95% CI 0.518-0.590). The DCA of nomograms for 1-, 2-, and 3-years was also established (Fig. 5c, f ). It is worth noting that the decision curves for 1 year, 2 years, and 3 years all showed higher credibility, which suggested the nomogram based on AS events and clinical data is effective for predicting the survival probability and prognosis of patients.

Construction of AS-SFs network
SFs are elements for regulating AS events. Compared with the normal tissues, alterations of SFs in tumors promote differential splicing patterns, which lead to an increase of pro-tumorigenic isoforms [28]. Therefore, it's important for us to understand whether DEAS events are regulated by specific key SFs in HP − GC. In order to understand this issue, we firstly identified six tumorrelated SFs through differential analysis (Additional file 6: Table S6). Then, we analyzed the relationship between the expression of these SFs and prognostic DEAS events, including OS-related DEAS events and DFS-related DEAS events. Two splicing regulatory networks were shown in Fig. 6. Each SF was significantly correlated with more than one OS-related and DFS-related DEAS event, reflecting the intricately cooperative and competitive association between SFs and AS events [29].

AS-based clusters associated with clinical data, tumor microenvironment, tumor mutation burden and immune features
Targeting at the individual level of cancer patients, the various expressions of each DEAS reflected the prognosis of some patients and can predict different clinical outcomes [15]. Therefore, we further identified the different AS patterns by unsupervised analysis based on the DEASs. By using the Elbow method and Gap statistic method to determine the optimal number of clusters, we finally determined the two clusters of samples: C1 (n = 26, 18.8%) and C2 (n = 112, 81.2%) (Fig. 7a).
In order to expound the clinicopathologic characteristics of the DEAS clusters, the association of clusters with clinal status was firstly explored and showed in a heat map (Fig. 7e). It revealed that the distribution of T stage and grade in HP − GC samples between two clusters was not random. Then, by using the ESTIMATE algorithm, we calculated immune and stromal scores to perform  Fig. 7b, c, the stromal score of C1 is significantly higher than that of C2. In addition, the result showed in Fig. 7d suggested that C2 had a significantly higher tumor mutation burden compared to C1. Meanwhile, by comparing the composition of 22 types of immune cells between the C1 (n = 17) and C2 (n=54), we found that there were significant differences between the two clusters in plasma cells, T cells CD4 memory resting, T cells regulatory (Tregs), NK cells resting, monocytes, mast cells resting and neutrophils (Fig. 7f ).

Discussion
GC is one of the cancers with the highest morbidity and mortality [30]. Finding effective prognostic and diagnostic markers for GC has become the focus of future research. However, because of the limitations of traditional gene identification methods, it has not brought obvious advantages to disease diagnosis and clinical results. Recently, with the development of gene sequencing technology, AS events have been widely studied and has shown some progression [31]. The biochemical mechanisms of AS events are complex and remain poorly understood to a large extent, but its importance for gene regulation cannot be ignored [32]. Abnormality of AS events have been shown to be associated with the occurrence, development and distant metastasis in many cancers [33]. Compared with a previous study reporting the relationship between GC and AS events [34], this study   focused on the value of AS events in HP − GC and visual display patients' prognosis by nomograms. In addition, the correlation between DEASs and tumor microenvironment, immune features and tumor mutation burden was studied to provide the basis for HP − GC immune and molecular mechanisms.
In the present study 1041 DEAS events from 930 genes were identified in HP − GC. In the results, a series of molecular mechanisms and pathways were significantly enriched in these genes through GO and KEGG enrichment analyses. Several of the enriched molecular mechanisms have been shown to promote tumorigenesis. For example, GTPase activity has a promoting effect on the metastasis and invasion of prostate cancer cells [35]. AMP-activated protein kinase (AMPK) can regulate cellular energy metabolism, and stimulate ATP generation [36]. Activated AMPK promotes glycolysis and enhances cellular differentiation of tumors [37]. These analysis results provide a basis for HP − GC's molecular mechanisms of AS events and provide a basic theory for subsequent experimental verification. In addition, to explore the tumor-related mechanisms of total parent genes in DEAS events, the parent genes of independent prognostic AS events have been discovered. The gene of BNIP3 in the AS we found can promote hypoxic survival and autophagy of cancer cells [38]. And autoantibodies against proteins translated by the RELL1 gene are considered as underlying biomarkers for detecting early-stage breast cancer [39].
In order to further understand the prognostic value of AS events in HP − GC, the seven OS-related and seven DFS-related prognostic AS events were included in the calculation of risk score to establish a prediction model. In previous studies, the median was often used as a cutoff value to divided patients into high-risk and low-risk groups. For the accuracy of the results, X-tile, a new and useful tool for bioinformatic analysis, was chosen to confirm the cut-point of the risk score. The survival status of three groups suggested that AS events can affect patient survival time and prognosis of HP − GC patients. Besides studying the relationship between AS events and prognosis, we also include clinicopathological data of HP − GC patients into this research to describe the prognosis more comprehensively.
According to the discovery of the important role of immune mechanisms in the GC progression in a previous study [14], the immune environment is considered as a key determinant of GC. Macrophages, neutrophils, dendritic cells, and immune cells of various T cell lineages are the major components of the tumor microenvironment and are involved in many processes of tumorigenesis and growth [40]. Besides, previous studies showed that AS events may be associated with immune cell infiltration by regulating tumor-associated immune cytolytic activity [41]. Therefore, in the AS events we determined, correlation analysis on 22 types of immune cells and risk scores was also be performed. From the results of the correlation coefficient, we can conclude that there was a strong correlation between monocytes and both DFS (r = 0.41) and OS (r = 0.23). Similarly, Zhang et al. [42] found that infiltrating immune cells were associated with survival, therapeutic responses and prognosis of breast cancer patients and monocytes were decreased in cancer patients with higher-grade tumors. From the perspective of clusters to further explore the association between AS events with clinical data, tumor mutation burden and immune features. It is obvious that the T stage of C2 and tumor mutation burden is higher than that of C1, while the stromal score is lower. It indicated that the tumor  = 138). b, c Immune score and stromal score between AS-based clusters. d Tumor mutation burden between AS-based clusters. e Heat map of the DEAS events ordered by cluster, with annotations associated with each cluster. Chi square test was used. f Statistical differences in each type of immune cell between C1 and C2. AS alternative splicing, DEAS differentially expressed alternative splicing mutation burden may be positively associated with the T stage, while the stromal score may be negative with it. This conclusion was similar to the previous one that higher tumor mutation burden tends to achieve a prognosis and to promote the infiltrations of immune cells such as T cells and NK cells in bladder cancer [43]. Pan et al proved that the infiltration of immune cells is closely related to tumor progression and prognosis [44]. The reason that there is no difference in immune score between C1 and C2 in our study may be the small sample size. But in fact, the infiltration of immune cells being involved in the evolution of GC has been clarified [45,46]. From the comparison results of the two groups in 22 types of immune cells, it can be seen that there are differences in the composition ratios of some immune cells, indicating that immune features are significantly associated with AS events. Different gene splicing patterns can be affected by antigen stimulation to regulate immune cell activation thresholds and to maintain internal environment stability [47]. The type I membrane protein receptor carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1) differently exhibits significant AS events and is highly expressed in various types of immune and parenchymal cell [48]. It can act on immunity such as inhibiting natural killer cell-mediated cytotoxicity, regulate neutrophil and monocyte development and function [48]. Besides, it also regulates T cell activation and mediates tolerance [49]. There are various splicing forms of PyTEPs in Yesso scallo, of which involvement participates in the immune response through different response models [50]. Various studies have shown that AS events are related to the distribution of immune cells, which were consistent with the conclusions of this study.
SFs can recognize and combine pre-mRNA codonregulated genes, and then influence the selection of exons and the choice of splice sites to achieve the purpose of regulating AS events [51]. Therefore, we comprehensively analyzed SFs and its expression to elucidate the splicing mechanism of HP − GC. The abnormal expression conditions of epithelial splicing regulatory protein 1 (ESRP1) and polypyrimidine tract binding protein 1 (PTBP1) are closely related to the expression of AS events. ESRP1 is one of the earliest epithelial restriction RNA binding proteins discovered, which can regulate AS of multiple epithelial transcripts [52]. Establishing the network between OS-related AS events and tumor-specific SFs, this research found that ESRP1 was associated with several OS-AS events that were correlated with OS. Similarly, ESRP1 was considered avital SF that leads to the progression and metastasis in pancreatic and prostate cancer [53,54]. While in addition to its role in splicing, PTBP1 also participates in the regulation of other aspects of RNA metabolism. Previous studies have shown that PTBP1 suppresses cell viability and promotes apoptosis during lung tumorigenesis [55].
This study still has some limitations. Firstly, it was a retrospective study, of which predictive models were on the basis of public databases. So, data from other regions are not included. Second, owing to the incidence of the disease was relatively low and the sample size included was small, the predictive power of some results needs to be increased and experimentally verify the role of AS events in HP − GC by enlarging the sample size. Finally, there is less immune information, including only the situation of immune cells. In order to fully explore the immune mechanisms associated with HP − GC, more immune information should be included, such as immunohistochemistry results and immune checkpoint detection.

Conclusion
In summary, the present study showed the prognostic value of AS events which play important roles in HP − GC tumorigenesis. Besides, our research also suggested that some prognostic SFs may be related to potential mechanisms of the splicing forms by regulating AS events. More importantly, the risk − classification based on AS events is not only crucial for deciphering tumorigenesis mechanisms, but also reveals the correlation between molecular changes and immune characteristics, which can be used as underlying prognostic biomarkers and therapeutic targets for HP − GC patients. DCA: Decision curve analysis; AMPK: AMP-activated protein kinase; CEACAM1: The type I membrane protein receptor carcinoembryonic antigen-related cell adhesion molecule 1; ESRP1: Epithelial splicing regulatory protein 1; PTBP1: Polypyrimidine tract binding protein 1.