Identification of key genes involved in tumor immune cell infiltration and cetuximab resistance in colorectal cancer

Background The anti-epidermal growth factor receptor (EGFR) antibody introduces adaptable variations to the transcriptome and triggers tumor immune infiltration, resulting in colorectal cancer (CRC) treatment resistance. We intended to identify genes that play essential roles in cetuximab resistance and tumor immune cell infiltration. Methods A cetuximab-resistant CACO2 cellular model was established, and its transcriptome variations were detected by microarray. Meanwhile, public data from the Gene Expression Omnibus and The Cancer Genome Atlas (TCGA) database were downloaded. Integrated bioinformatics analysis was applied to detect differentially expressed genes (DEGs) between the cetuximab-resistant and the cetuximab-sensitive groups. Then, we investigated correlations between DEGs and immune cell infiltration. The DEGs from bioinformatics analysis were further validated in vitro and in clinical samples. Results We identified 732 upregulated and 1259 downregulated DEGs in the induced cellular model. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses, along with Gene Set Enrichment Analysis and Gene Set Variation Analysis, indicated the functions of the DEGs. Together with GSE59857 and GSE5841, 12 common DEGs (SATB-2, AKR1B10, ADH1A, ADH1C, MYB, ATP10B, CDX-2, FAR2, EPHB2, SLC26A3, ORP-1, VAV3) were identified and their predictive values of cetuximab treatment were validated in GSE56386. In online Genomics of Drug Sensitivity in Cancer (GDSC) database, nine of twelve DEGs were recognized in the protein-protein (PPI) network. Based on the transcriptome profiles of CRC samples in TCGA and using Tumor Immune Estimation Resource Version 2.0, we bioinformatically determined that SATB-2, ORP-1, MYB, and CDX-2 expressions were associated with intensive infiltration of B cell, CD4+ T cell, CD8+ T cell and macrophage, which was then validated the correlation in clinical samples by immunohistochemistry. We found that SATB-2, ORP-1, MYB, and CDX-2 were downregulated in vitro with cetuximab treatment. Clinically, patients with advanced CRC and high ORP-1 expression exhibited a longer progression-free survival time when they were treated with anti-EGFR therapy than those with low ORP-1 expression. Conclusions SATB-2, ORP-1, MYB, and CDX-2 were related to cetuximab sensitivity as well as enhanced tumor immune cell infiltration in patients with CRC.


Background
In the United States, colorectal cancer (CRC) is estimated to be the third-most frequently occurring malignant tumor and third leading cause of cancer-related mortality, leading to the deaths of about 53,200 people in 2020 [1]. The incidence and mortality rates in 2015 were 37,000 cases and 19,000 deaths, with a 25-to 28-months overall survival time for Chinese patients suffering advanced CRC [2]. Targeting epidermal growth factor receptor (EGFR), monoclonal antibodies cetuximab and panitumumab are effective therapeutics [3,4] that shrink tumors for resectability [5][6][7] or relieve symptoms from unresectable masses by repressing tumor growth [8][9][10].
Mutations in the RAS family suggest continuous activation of EGFR downstream signaling [11][12][13][14][15] and RAS genes (KRAS, NRAS) are used as biomarkers to predict poor response to anti-EGFR therapy [16,17]. However, tumor microenvironment re-plasticity, characterized by tumor immune cell infiltration and upregulation of immune checkpoint-related proteins, has recently been recognized as a novel explanation for drug resistance and treatment failure [18][19][20]. In particular, the interaction between tumor and immune microenvironment triggered by cetuximab or panitumumab in CRC accounts for immunogenic cell death and immune treatment resistance [21,22].
In this study, we firstly performed high-throughput screening of transcriptomic alterations before and after induction cetuximab administration in a colon cancer cell model (CACO2). Then, we conducted integrated bioinformatics analysis of the transcriptome variations using public datasets from the Gene Expression Omnibus (GEO) database to identify several core differentially expressed genes (DEGs) and investigated whether they were correlated with tumor immune cell infiltration. Finally, we validated the identified core DEGs in vitro and in clinical samples.

Real-time quantitative PCR after cetuximab treatment in vitro
Total cellular RNA of cell lines treated with cetuximab was extracted with TRIzol reagent (Invitrogen, Carlsbad, CA, USA) under the manufacturer's instructions. Recombinant DNase I (Takara Bio, Beijing, China) was used to remove potential genomic DNA contamination. cDNA was generated with the PrimerScript ™ RT master mix kit (Takara Bio, Dalian China). Real-time quantitative PCR analysis of the identified DEGs was performed using the TB Green Premix Ex Taq ™ kit (Takara Bio, Dalian China). All results were normalized to human GAPDH mRNA expression. The primers were listed in Additional file 1: Table S1. The relative threshold cycle (Ct) method was used to display the results.

Identification of differentially expressed genes (DEGs) in the cellular model
For the microarray screening of CACO2 cells treated with cetuximab, we visualized the principal component analysis (PCA) results using the R package ggbiplot (https ://githu b.com/vqv/ggbip lot) after converting the raw signals into gene expression levels. We set thresholds of p < 0.05 and | log2 (fold change) | > 1 to identify significant DEGs using the limma package and created heat maps using the heatmap package.

Gene Ontology (GO) term and Kyoto Encyclopedia of genes and genomes (KEGG) pathway enrichment analyses
For the enrichment analyses of the DEGs, a Metascape (https ://metas cape.org/) online analysis tool was used [24]. The parameters were set as the following: min overlap = 3, p-value cutoff = 0.01, and min enrichment = 1.5. Related terms were selected from the top 20, according to their p-values.

Gene sets Enrichment analysis (GSEA) and Gene set variation analysis (GSVA)
GSEA was used to screen the biological states and processes associated with significantly upregulated or downregulated genes in the resistance group, with H.all. v7.1.symbols.gmt as the reference set [25]. The number of permutations was set to 1000. A p-value < 0.05 was considered indicative of significant enrichment. A normalized enrichment score was established to evaluate the degree of enrichment. GSVA was used to describe the enrichment degree of a biological state or process in a sample [26]. Enrichment scores were compared, and those meeting the standards of |log 2 (fold change) | > 1 and p < 0.05 were considered significant results.

Construction of protein-protein interaction (PPI) network
The PPIs of the central DEGs were generated using the STRING online database (https ://strin g-db.org). To construct a PPI network, we retrieved cetuximab resistance candidates from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https ://www.cance rrxge ne.org) [27], confirmed their interactions using the STRING database [28], and finally visualized the result using Cystoscape [29].

Prediction of tumor immune cell infiltration
To investigate the relationship between hub DEG expression and tumor immune cell infiltration, the transcriptome landscapes of colonic or rectal adenocarcinoma samples (n = 177) in The Cancer Genome Atlas (TCGA) database were analyzed using the online tool Tumor Immune Estimation Resource Version 2.0 (https ://cistr ome.shiny apps.io/timer /) [30,31].

Validation in clinical samples by immunohistochemistry staining
Patients with advanced CRC (n = 102) were diagnosed by pathological examination and administered systemic chemotherapeutics at Zhongshan Hospital of Fudan University. Fifty-two patients received therapy containing cetuximab. The progression-free survival time (PFS) was recorded to evaluate the efficacy of cetuximab. Patients signed informed consent documents, and the ethics committee of Zhongshan Hospital approved the study.
Tumor samples were fixed with 4% paraformaldehyde, embedded in paraffin, cut into sections of about 5 µm, and placed onto glass slides. After the samples were deparaffinized with xylene, hydrophilized, and unmasked, they were blocked with bovine serum albumin, immunostained with primary antibodies against SATB-2 (21307-1-AP, 1:100, Proteintech Group, Inc., Subsequently, the slides were stained with 3,3'-diaminobenzidine and counterstained with hematoxylin. Antigen-antibody complexes in the whole sample were detected using a panoramic slice scanner (3DHISTECH, Budapest, Hungary), recorded in a file, and viewed using CaseViewer 2.2 (3DHISTECH). To evaluate gene expression in tissues, the following formula was used to calculate the H-score using Quant Center 2.1 (3DHISTECH): H-SCORE = ∑ (PI × I) = (percentage of cells of weak intensity × 1) + (percentage of cells of moderate intensity × 2) + percentage of cells of strong intensity × 3), where PI is the proportion of the positive signal pixel area and I is the coloring intensity.

Statistical analysis
Most statistical analyses were completed using bioinformatic tools mentioned above in R (version 3.4.1). The Benjamini and Hochberg False Discovery Rate method was utilized to adjust the p-values in screening DEGs from GEO profiles. Fisher's exact test was employed to identify the significant GO terms and KEGG pathways. The correlation significance was examined by Spearman and Pearson correlation analyses. Differential expression levels of identified DEGs were assessed by a two-tailed Student's t-test. Kaplan -Meier survival curve and Log -rank test analysis was applied to investigated the predictive valued of identified DEGs for patients with CRC and receiving cetuximab contained therapy. The value of p < 0.050 was considered statistically significant.

Model establishment and screening of DEGs
The study design is shown in Fig. 1. After the cetuximab resistant cellular model was established successfully (Fig. 2a), preprocessing of the raw data revealed a uniform distribution of DEGs between CACO2-CS and CACO2-CR (Fig. 2b). The PCA results had acceptable reproducibility (Fig. 2c). A total of 1991 DEGs were identified, with 732 upregulated genes and 1259 downregulated genes in CACO2-CR. These DEGs are shown in a volcano plot (Fig. 2d) and a heat map (Fig. 2e) and are listed in Table 1. In GO analysis, the most enriched GO terms determined by the cellular component function (CC) were "extracellular matrix, " "apical plasma membrane" and "extracellular matrix component" (Fig. 2h). KEGG signaling analysis suggested that the DEGs were considerably enriched in "protein digestion and absorption, " "cytokine-cytokine receptor interaction, " "retinol metabolism" shown in Fig. 2i and Table 2.

Common DEGs and PPI network construction
We identified 708 ( Fig. 4a) and 298 ( Fig. 4b) DEGs in GSE5851 and GSE59857, respectively. After integrating the DEGs from our own screenings with those from the two GEO datasets, 12 common DEGs were identified (Fig. 4c). These DEGs did not represent any protein-level interactions (Fig. 4d). Then, we combined     recognized cetuximab resistance-related genes from the GDSC database (Table 5) with the 12 common DEGs and constructed a PPI network (Fig. 4e). The following six core DEGs were identified: SATB-2, ORP-1, MYB, CDX-2, SLC26A3, and EPHB2. Consequently, we selected the GSE56386 dataset to validate the variations in the expression levels of the 12 DEGs between cetuximab responders and non-responders in clinical terms. SATB-2, MYB, CDX-2, SLC26A3, and FAR2 were downregulated (Fig. 5a, c-e, g) and AKR1B10 was upregulated in the non-responder group (Fig. 5l). Although a trend of downregulation was observed in the other six genes (Fig. 5b, f, h-k), they exhibited no significant differences between two groups, possibly owing to the small sample size. These 12 genes had an optimal prediction accuracy in GSE56386. The receiver operating characteristic curves for the 12 genes are presented in Additional file 2: Fig. S1.

Core DEGs and tumor immune cell infiltration
We observed significant correlations between the expression levels of the core DEGs SATB-2 (Fig. 6a), ORP-1 (Fig. 6b), MYB (Fig. 6c), and CDX-2 (Fig. 6d) and tumor immune cell infiltration represented by the expression of B cell, CD4 + T cell, CD8 + T cell, and macrophage markers in TCGA. Immunohistochemical staining of a tissue microarray containing 102 CRC clinical samples (Fig. 7a) demonstrated that the expression of these four genes was positively associated with tumor-infiltrating immune cell markers such as CD4, CD8, CD19, and CD68 ( Fig. 7b; Table 6).

Exploring the expression of identified core DEGs in vitro
We conducted preliminary experiments in several CRC cell lines and detected that ORP-1, MYB, and CDX-2 were downregulated in NCIH508 wtRAS/RAF (a cell line sensitive to cetuximab), while SATB-2 expression decreased in CACO2 wtRAS/RAF (a cell line partially sensitive to cetuximab) and HT29 wtRAS/mtRAF (a cell line resistant to cetuximab) at the half-maximal inhibition concentration of cetuximab. A reduction in ORP-1 and CDX-2 expression was observed in most of these cell lines regardless of RAS and RAF status (Fig. 8a).

The roles of identified core DEGs in clinical samples with anti-EGFR therapy
The clinicopathological characteristics of the patients were recorded and are shown in Additional file 3: Table S2. Median follow-up time was 33.1 months (interquartile range, 17.2 to 52.5 months). Compared with patients with low ORP-1 expression, those with high ORP-1 expression in CRC experienced a significantly longer PFS time (median times were 9.0 months and 11.0 months, respectively, hazard rate [HR] = 1.901, p = 0.047) after administration of chemotherapeutics containing cetuximab (Fig. 8b).

Discussion
Alterations in genes that function in the epidermal growth factor (EGF) signaling pathway, such as increases in EGFR copy number, amplification of ERBB family, overexpression of IGF1 or VEGF, or novel mutations such as point mutations in RAS, BRAF, PI3KCA, or MEK in the EGFR extracellular domain or in the downstream pathway, result in EGFR antagonist resistance. Recently, disturbances in the tumor microenvironment caused by EGFR antibody have also been recognized as factors in treatment failure. Garvey et al. reported that cetuximab causes cancer-associated fibroblasts to secrete more EGF and reactivate mitogen-activated protein kinase signaling in para-CRC cells [32]. Critically, it is important to understand the interaction between tumor cells and the tumor microenvironment in anti-EGFR therapy resistance.
We conducted an integrated bioinformatic analysis to identify nine common DEGs between the cetuximab sensitive and resistant groups combining high-throughput data of the cetuximab resistant cell line model and data of cell lines and clinical samples from GEO profiles. The relationship between DEGs identified in our study and tumor immune cell infiltration was evaluated in TCGA and validated in clinical samples from our hospital. We found that four (SATB-2, ORP-1, MYB, and CDX-2) of nine DEGs were associated with infiltrated T cells, B cells, and macrophages in CRC. The decreasing trend of expression levels of ORP-1, MYB and CDX-2 under cetuximab pressure was observed in vitro (Additional file 4: Fig. S2), which was consistent with the expression changes in the established cetuximab resistant cell line (CACO2-CR). Moreover, patients with high expression levels of these genes, especially ORP-1, exhibited prolonged survival receiving anti-EGFR therapy.
SATB-2 encodes a DNA binding protein and mediates transcription regulation as well as chromatin remodeling. SATB-2 attenuates the activity of MEK5/ERK5 and suppresses tumorigenesis and metastasis [33]. The upregulation of SATB-2 via DNA demethylation of the promoter region and H3K4me3 increases TH1-type chemokine expression and immune cell density in CRC [34]. ORP-1, as a member of the oxysterol-binding protein family involve in human innate immune system, binds to phosphatidylinositol 3-phosphate by interacting with RAB7A and stabilizing GTP-RAB7A and regulates the MHC class I -mediated antigen processing and presentation pathway [35,36]. In addition, ORP-1 suppresses tumorigenesis via metabolism-associated pathway [37]. MYB is a protein-encoding oncogene that functions as a transcriptional activator. Paradoxically, patients with CRC and high MYB expression exhibit low incidence of distant metastases [38] and favorable clinical prognosis [39]. Millen et al. demonstrated that a high level of CD8 + (See figure on next page.) Fig. 4 Identification of common DEGs and construction of PPI network. a, b Identification of DEGs between the resistant (red) and sensitive (blue) groups from the GSE5851 and GSE59857 datasets, illustrated by a heat map. Each column is a sample and each row is a gene. c The venn diagram shows the intersection of DEGs among GSE5851, GSE59857 and CACO2-CR cellular model. d The Network diagram illustrates the interactions of common DEGs. e The PPI network including 12 DEGs and recognized cetuximab resistance-related genes from the Genomics of Drug Sensitivity in Cancer database. DEGs: Differentially expressed genes. PPI: protein-protein interaction network tumor immune infiltrating cells and a clinical history of longer relapse-free survival were related to high expression of MYB. An immunomodulatory effect conferred by MYB was also observed in CD8 + TILs in the murine CRC model [40]. CDX-2 encodes a regulator of intestinespecific genes involved in cell growth and differentiation. CDX-2 is downregulated in the invasive part of tumor tissues and is associated with tumor-stroma protein expression as well as inflammatory cytokine release in CRC [41]. Low levels of CDX-2 expression indicate a particularly poor survival prognosis, especially in patients with tumors that have a high stromal content [42]. Zanella et al. reported that EGFR antagonists inhibit colorectal tumor growth and simultaneously protect the tumor from inhibition by transcriptional regulation [43]. Woolston et al. supported this finding and revealed that cetuximab prompts a transformation from a mutational variation to a mesenchymal transition representative of tumor-associated fibroblast enrichment [20]. It seems that cetuximab evokes immune inflammation via antibody-dependent cell-mediated cytotoxicity but paradoxically weakens this effect via a form of IgG1 antibody-mediated immunogenic cell death [21] while upregulating immunosuppressive TGF-β expression in CRC [44]. In fact, in a stage Ib/II trial of combined cetuximab and pembrolizumab treatment for patients with advanced CRC, the combination treatment of cetuximab with pembrolizumab significantly increased the density of CD3 + , CD8 + , and CTLA-4 + lymphocytes and natural killer cells in tumors. In the peripheral blood, the overall density of CD4 + and CD8 + lymphocytes decrease, especially that of the PD1 + memory T cells [45,46]. In the process of acquiring secondary resistance, the tumor might experience the transition from the immune-inflamed phenotype to the immunedesert phenotype, which features by key target genes   High  x40  x10  Moderate  x40  x10  Low  x40  x10  High  x40  x10  Moderate  x40  x10 Low Tumor infiltrating immune cells markrs involved in tumor immune cell infiltration [47][48][49]. We believed that the core DEGs identified in this study, SATB-2, ORP-1, MYB, and CDX-2, might play critical roles in the transition. Nevertheless, there were some limitations lying in this study. As models of anti-EGFR antibody resistance are not easy to established, only one cellular model was established and screened in the study. More cetuximab resistant cellular models with wild-type RAS and RAF genotypes are required to be established for validation. The DEGs identified using in silico methods should be further validated in vitro and in vivo. The results drawn from the retrospective clinical cohort with a small sample size also need further verification in a larger population. The specific immune cell types involved in secondary anti-EGFR antibody resistance should be more clearly elucidated.

Conclusions
In summary, we distinguished cetuximab-induced DEGs associated with variations of tumor immune cell infiltration in CRC by establishing a cetuximab resistant cellular model and integrated bioinformatics analysis. The results suggested that transcriptomic alterations and immune landscape remodeling should receive additional scrutiny during anti-EGFR antibody treatment. Furthermore, immunotherapy could be considered in the early stages of cetuximab treatment rather than after resistance has already occurred.