Identication of an Immune-related Signature That Indicates the Dedifferentiation of Thyroid Cells

Background: Patients with well-differentiated thyroid carcinoma can achieve long-term survival after reasonable treatments, but there is no standard treatment mode for poorly or undifferentiated thyroid carcinoma and its prognosis is very poor. Immune cells, especially tumor-associated macrophages, account for a large proportion of the tumor microenvironment of anaplastic thyroid carcinomas (ATCs). However, whether immune-related genes can mediate the dedifferentiation of thyroid cells is unclear. Methods: We initially compared the differences of thyroid differentiation score, intration of immune cells and enriched pathways between ATCs and papillary thyroid carcionma (PTCs) or normal thyroid tissues in Gene Expression Omnibus database. Then, The Cancer Genome Atlas database was used to screen out the prognosis associated IRGs. A risk score was constructed and we next investigated its predictive value for differentiation by applying receiver operating characteristic (ROC) curves and correlation analyses. Kaplan-Meier curves were used to evaluated its prognostic value. We further explored the associations of the risk score with important immune checkpoint molecules, inltrating immune cells and response to immunotherapy. Results: Compared with PTCs or normal thyroid tissues, ATCs exhibited lower thyroid differentiation scores, higher inltration of most immune cells and higher activation of inammatory response. The risk score composed of MMP9 and SDC2 was signicantly increased in ATCs and low differentiated PTCs. Moreover, it showed favorable predictive value for differentiation and survival. Higher risk score displayed dedifferentiation status and a worse prognosis. Additionly, the risk score was positively correlated with immune checkpoint molecules PDL1, CTLA4, IDO1, HAVCR2 and inltration of multiple immune cells. Importantly, we found that samples with higher risk score tend to have a better response to immune checkpoint agents than lower ones. Conclusion: Our ndings indicate that the risk score may not only contribute to the judgement of differentiation and prognosis of thyroid cancer, but also help to the prediction of immune cell inltration and immune checkpoint inhibitor response. dichotomized into a low-risk score group and a high-risk score group. Correlation analyses were estimated by Pearson’s test in R program. Receiver operating characteristic (ROC) analyses were carried out to calculate the area under the curve (AUC) to assess the accuracy of the risk score for predicting dedifferentiation. Kaplan-Meier curves of the PFI were built, and the log-rank test was applied to determine the differences in SPSS software. Distributions of the response to immunotherapy in the low-risk score group versus the high-risk score group were evaluated with the chi-square test or two-sided Fisher's exact test.


Background
Thyroid carcinoma is a common endocrine malignancy. Among all the subtypes, the clinical behaviors of papillary thyroid carcinoma (PTC) and anaplastic thyroid carcinoma (ATC) are fully distinct. PTC accounts for approximately 80% of all thyroid carcinomas, and most patients are curable. In contrast, ATC, an undifferentiated tumor, is extremely rare and highly aggressive, with a dismal median survival of only 6 months. Conventional systemic treatments, including radioiodine therapy, radiotherapy and chemotherapy, have little effect on ATC, which makes it a signi cant clinical challenge [1,2].
To better evaluate the differentiation characteristics of thyroid carcinoma, a thyroid differentiation score (TDS) signature, composed of 16 thyroid metabolism genes, has been constructed and contributes to the pathological classi cation of patients [3]. An important hallmark of ATC is the de ciency of expression of thyroid differentiation markers, which partly leads to radioiodine therapy failure [4]. The most accepted hypothesis claims that ATC progresses from PTC through the accumulation of many genomic mutations [4,5]. However, Seoane et al [6] performed exome sequencing, and very few shared trunk alterations were observed between ATC and PTC. The strong genomic difference made them believe that ATC and PTC evolve independently at the early stage of tumor development. A major implication of the nding is that the clinical management of the two tumor types should be appropriate for their molecular characteristics, which improves the understanding of the undifferentiated feature of ATC. Recently, a large number of studies have revealed the essential role of the tumor microenvironment (TME) in cancer progression and response to immunotherapy [7,8]. Similarly, ATCs are reported to be greatly in ltrated by tumorassociated macrophages (TAMs) and have a high level of markers of the M2 macrophage phenotype, which plays an essential role in tumor metastasis [4,[9][10][11]. In addition, some immune checkpoint inhibitors have shown effects on ATC [12][13][14][15], which prompted us to investigate the role of immune cells or immune-related genes (IRGs) in the occurrence and progression of undifferentiated thyroid carcinoma.
In this study, we initially analyzed the difference in TDS and the in ltration of 28 immune cells between ATCs and PTCs or normal thyroid tissues. Then, efforts were concentrated on the exploration of IRGs in both the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. A two-gene risk score signature was constructed and had predictive power for dedifferentiation and poor prognosis in PTCs. Additionally, signi cant correlations were found between the risk score and common immune checkpoints or in ltrating immune cells. Moreover, the risk score was found to have a certain predictive value for immunotherapy response.

Public Cohort datasets and preprocessing
We systematically searched for publicly available ATC transcriptome datasets. In total, four cohorts using the same array platform (Affymetrix Human Genome U133 Plus 2.0 Array) were gathered for this study: GSE33630, GSE29265, GSE65144 and GSE76039 [4,16,17]. The raw expression data were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and normalized using the R package limma.
When a gene symbol corresponded to multiple probes, we selected the highest value as its expression level, and a probe was deleted when it was recorded with multiple genes. To enhance the reliability of the validation for screened genes, we merged the GEO datasets mentioned above, and batch effects were removed using the R package sva.
We also used transcriptome data and clinical information of PTCs from the TCGA database downloaded from UCSC Xena browser (https://xenabrowser.net/datapages/), and 505 patients with PTC were enrolled in our study. Considering the excellent survival outcome for most PTC patients, progression-free interval (PFI) was regarded as the preferable indicator of prognosis [18].
To investigate the treatment response to immunotherapy, a dataset of urothelial cancer patients received anti-PD-L1 therapy (IMvigor210 cohort) was acquired from the R package IMvigor210CoreBiologies, and a dataset of AB1-HA mesothelioma mice treated with anti-CTLA4 agents (GSE63557) was obtained [19,20].
To compare the differentiation level among different samples, a list of 16 TDS genes was obtained from a published study on PTC [3]. We summed the 16 genes for each sample to obtain the TDS and then separated PTCs from the TCGA dataset into a low-differentiated group and a high-differentiated group according to the median value.

Collection of immune related data
The stromal score and immune score for each sample were calculated by the ESTIMATE package in the R program [21]. Single sample gene set enrichment analysis (ssGSEA), as implemented in the R package GSVA, was conducted to assess the in ltration level of immune cells, and principal component analysis (PCA) was applied.
The lists of IRGs were collected from the ImmPort web portal (https://immport.org/shared/home), which contains a very large number of immunology data and resources. Overall, 1240 IRGs were present on our array and were further analyzed.

Function and pathway enrichment analysis
Gene annotation enrichment analysis was carried out by using the R package clusterPro ler. Gene set enrichment analysis (GSEA) was performed three times in our study: one was to compare the differences in IRGs between ATCs and normal tissues or PTCs in the GSE33630 cohort; the second was to identify differentiation-associated immune signaling pathways in the TCGA cohort; and the third was to analyze the differences between low-risk-score and high-risk-score subtypes in terms of the expression of broad hallmark gene sets in the combined GEO cohort [22]. We also identi ed the signaling pathways of deregulated IRGs of ATCs in the GSE33630 cohort by performing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis.

Cell culture
All cell lines were cultured in DMEM supplemented with 10% fetal bovine serum (FBS) (Gibco). The normal thyroid follicular epithelial cell line Nthy-ori 3 − 1 and PTC cell line TPC-1, were obtained from the Institute of Medical Biology Chinese Academy of Medical Sciences (Kunming). The ATC cell lines CAL-62, DRO and 8305C, were provided by Sun Yat-sen University Cancer Center (Guangzhou). All cell lines were authenticated by short tandem repeat (STR) sequencing and con rmed to be negative for mycoplasma.

qRT-PCR
Total RNA was isolated from cells using the FastPure® Cell/Tissue Total RNA Isolation Kit (Vazyme). cDNA synthesis was carried out using HiScript® II Q RT SuperMix (Vazyme). qPCR was performed using ChamQ™ SYBR® qPCR Master Mix (Vazyme) on the Applied Biosystems 7500 system. The reaction procedure was set up according to the manufacturer's instructions. The sequences of the primers were as follows: GAPDH was used as a control, and the fold changes of target genes were calculated by the ΔΔCt method; the normalized gene expression was calculated using 2 −ΔΔCt .

Statistical analysis
The analysis of count data for differentially expressed IRGs was performed by R package limma in GEO cohorts and IMvigor210 cohort and by R package DESeq2 in TCGA cohort, respectively. The Benjamini-Hochberg method was applied to adjust the p-value based on the false discovery rate (FDR). The eBayes method in the R package was used to identify the differentially expressed genes (DEGs). FDR < 0.05 and fold change ≥ 2 were considered DEGs. The R packages VennDiagram and Heatmap were used to draw Venn diagrams and heatmaps. ggplot2 was used to generate volcano plots and other plots.
The Wilcoxon test (also called the Mann-Whitney U test) was applied to compare continuous variables between two groups. For comparisons among three groups, the Kruskal-Wallis test was used. To evaluate the association of clinicopathological characteristics or IRGs with PFI in the TCGA cohort, the univariate Cox proportional hazard model was used. A multivariate Cox regression model was further conducted to identify independent prognostic factors. Both the hazard ratio (HR) and the 95% con dence interval (CI) were calculated.
To investigate the role of selected IRGs, a two-gene risk score model was constructed and de ned as: risk score =∑(Coef i * Exp i ) [23]. Based on the median value, samples from different cohorts were dichotomized into a low-risk score group and a high-risk score group. Correlation analyses were estimated by Pearson's test in R program. Receiver operating characteristic (ROC) analyses were carried out to calculate the area under the curve (AUC) to assess the accuracy of the risk score for predicting dedifferentiation. Kaplan-Meier curves of the PFI were built, and the log-rank test was applied to determine the differences in SPSS software. Distributions of the response to immunotherapy in the lowrisk score group versus the high-risk score group were evaluated with the chi-square test or two-sided Fisher's exact test.
All graphic representations and statistical analyses were obtained by using R software (version 3.6.1), GraphPad Prism (version 7.0) and SPSS software (version 21.0). p < 0.05 was considered to be statistically signi cant for all methods.

Results
ATCs exhibit lower TDS values and higher in ltration levels of most immune cells than PTCs and normal thyroid tissues.
An overview of our study on thyroid dedifferentiation-associated IRGs is shown in Fig. 1. Loss of differentiation plays an important role in the progression of thyroid carcinoma. The TDS is a 16metabolic-gene signature and serves as a parameter of thyroid differentiation [3]. We initially compared the TDS levels among ATCs, PTCs and normal thyroid tissues in the GSE33630 cohort. As expected, ATCs had obviously lower TDS values than the control samples for all genes (Fig. 2a-b). It has been reported that the immune landscape was correlated with the TDS in thyroid carcinoma [24]. To evaluate the differences in immune-related signatures, the ESTIMATE algorithm in the R program was applied to analyze transcriptome data and showed that ATCs had higher stromal scores and immune scores than PTCs and normal tissues (Fig. 2c). Consistent with this result, higher in ltration of most immune cells was observed in ATCs than in control groups, as indicated by the R package GSVA (Fig. 2d). Moreover, ATCs were well discriminated from PTCs or normal tissues with the PCA algorithm (Fig. 2e).
The in ammatory response is obviously activated in ATCs.
To further explore the differences in IRGs between ATCs and PTCs or normal thyroid tissues in the GSE33630 cohort, the powerful web tool ImmPort was used; a total of 1240 IRGs were present on our array. The volcano plots of GSEA displayed the top ve pathways signi cantly activated in ATCs compared with normal tissues or PTCs, among which the in ammatory response was the most obvious (ATCs vs normal: normalized enrichment score (NES) = 2.16, p = 0.013; ATCs vs PTCs: NES = 2.15, p = 0.002) ( Fig. 3a-b). We then analyzed the expression data of 1240 IRGs and found 207 IRGs were commonly deregulated (126 upregulated and 81 downregulated IRGs; fold change ≥ 2 and p 0.05) in ATCs compared with both normal tissues and PTCs (Fig. 3c-d). GO and KEGG enrichment analysis of consistently deregulated IRGs indicated signi cant activation of immune-associated signaling pathways in ATCs (Fig. 3e-f). To further screen the deregulated IRGs, another GEO cohort GSE29265 was also included, and a total of 68 genes (36 upregulated IRGs and 32 downregulated IRGs) were screened out and validated with the two cohorts (Fig. 4a-c).
Identi cation of differentiation-associated and prognostic IRGs.
To further analyze the differentiation-associated IRGs, we initially enrolled 505 PTC samples with complete clinical annotations and transcriptome data from TCGA database, which were then divided into the low-differentiated (low TDS) group and the high-differentiated (high TDS) group by the median TDS value. The volcano plot based on GSEA of 1240 IRGs indicated that the two phenotypes displayed distinct activation of immune-related pathways. Compared with the high-differentiated group, interferon gamma response (NES = 2.15, p = 0.003), in ammatory response (NES = 2.10, p = 0.003) and interferon alpha response (NES = 1.90, p = 0.002) were signi cantly enriched in the low-differentiated group (Additional le 1: Figure S1). Considering that patients with ATC in GEO database lack clinical information, we thus focused on the low-differentiated group in the TCGA cohort to identify the IRGs relative to prognosis. First, clinical characteristics and 68 IRGs were enrolled in a univariate Cox analysis.
We then assessed the expression levels of MMP9 and SDC2 in a combined GEO cohort composed of 52 ATCs, 78 normal tissues and 69 PTCs from the same chip platform and in a TCGA cohort grouped by TDS level. As shown in Fig. 5a and 5b, MMP9 was signi cantly higher in ATCs and low-differentiated PTCs than in normal tissues and high-differentiated PTCs, whereas SDC2 displayed the opposite expression pro le, as its expression was relatively low. qPCR analyses further indicated that ATC cell lines CAL-62, DRO and 8305C possessed lower SDC2 levels; DRO and 8305C had higher MMP9 levels than PTC cell line TPC-1 and normal thyroid cell line Nthy-ori 3-1 (Fig. 5c). In addition, an immune-related risk score was constructed and was obviously higher in ATCs and low-differentiated PTCs than in normal tissues and high-differentiated PTCs (Fig. 5d).
We proceeded to analyze the ability of the risk score to predict dedifferentiation in the combined GEO cohort and TCGA cohort. Correlation analyses (Fig. 5e) showed that the risk score was negatively associated with TDS (r = -0.77, p < 0.001 in the combined GEO cohort; r = -0.46, p < 0.001 in the TCGA cohort). Consistent with this nding, ROC curves con rmed the robust predictive value in both cohorts, and the AUC values were 0.842 (p < 0.001) and 0.707 (p < 0.001) (Fig. 5f). Moreover, we estimated the prognostic value of a single gene and the two-gene risk score. Kaplan-Meier curves (Fig. 5g) indicated that PTC patients with high MMP9 (HR = 2.28; p = 0.005), low SDC2 (HR = 2.47; p = 0.002) or high risk score (HR = 3.05; p = 0.0003) had a signi cantly shorter PFI than other patients in the TCGA cohort, indicating the potential of the risk score to serve as a prognostic biomarker.
The two-gene risk score is closely correlated with immune-related signatures.
Immunotherapy has revolutionized the treatment of many patients with cancer. However, through exploration of the complicated mechanisms of cancer immunoediting, it was found that some negative costimulatory molecules and immunosuppressive cytokines could promote immune escape of tumor cells by inhibiting immune responses, especially antitumor T cells [25,26], which stimulated the appearance of immune checkpoint therapy. We wanted to explore the role of the two-gene risk score in immune-related signatures. First, in both the combined GEO cohort and TCGA cohort, a marked positive correlation was found between the risk score and important immune checkpoint molecules, including programmed cell death ligand 1 (PDL1; also known as CD274), cytotoxic T-lymphocyte-associated protein 4 (CTLA4), indoleamine 2,3-dioxygenase 1 (IDO1) and hepatitis A virus cellular receptor 2 (HAVCR2; also known as TIM3), as illustrated in Fig. 6a and 6b. We further analyzed whether the risk score was correlated with the in ltration of immune cells; ssGSEA was applied, and as shown in Fig. 6c, most immune cells displayed a higher level of in ltration in the high-risk score group than in the low-risk score group in the combined GEO cohort. Correlation analysis further con rmed the close relationship between risk score and the in ltration of most immune cell types (Fig. 6d).
With deep research on predictive markers for immunotherapy, emerging studies support the crucial role of the TME in the response to immune checkpoint inhibitors [8,27]. Considering the positive correlation of the risk score with the expression of common immune checkpoint molecules and the in ltration of immune cells, we further explored whether it helps to predict the e cacy of immunotherapy. At present, increasing researches have demonstrated that some immune checkpoint inhibitors, including the PD-1 inhibitors spartalizumab and pembrolizumab and the PD-L1 inhibitor nivolumab [2,[12][13][14][15], are effective in ATCs as a salvage therapy. However, no such cohort with complete prognosis data and expression pro les was obtained for ATCs. We therefore investigated the value of the two-gene risk score for predicting the response to immunotherapy in the IMvigor210 cohort and the GSE63557 cohort, a dataset of urothelial cancer patients who received an anti-PD-L1 agent and a dataset of AB1-HA mesothelioma tumor mice treated with anti-CTLA4 therapy [19,20]. We found that patients with high risk score tended to have a better response to immunotherapy than those with a low risk score in the IMvigor210 cohort (p = 0.003; Fig. 6e). The number of mice responding to anti-CTLA4 agents was higher in the GSE63557 cohort, but the difference was not statistically signi cant (p = 0.18; Fig. 6g). In addition, the risk score was higher in patients with complete response (CR) to anti-PD-L1 therapy than in patients with progressive disease (PD) (p < 0.05) (Fig. 6f). Similarly, the risk score was slightly increased in mice that responded to anti-CTLA4 therapy compared with those that did not respond (p < 0.05) (Fig. 6h).
Molecular basis of the immune-related risk score signature.
We further explored the differences between the high-risk score and low-risk score samples. GSEA was performed on the broad hallmark gene sets in the combined GEO cohort. The volcano plots displayed distinct enrichment of signaling pathways. As shown in the GSEA plots with high NES values, compared with the low-risk score group, epithelial-mesenchymal transition, TNFα signaling, and some common immune-related signaling pathways, including the IL-6/JAK/STAT3 pathway, interferon alpha response, interferon gamma response and in ammatory response, were signi cantly enriched in the high-risk score group (Additional le 2: Figure S2a-b).

Disscussion
Focusing on the mechanism of dedifferentiation promotes a better understanding of the high invasiveness of ATCs. Many previous studies have found that some molecules and pathways, such as STRN-ALK, P53, dendrogenin A, survivin and certain long noncoding RNAs, are involved in the dedifferentiation process of thyroid cells [28][29][30][31]. Correspondingly, by analyzing the available public databases of thyroid carcinoma, we found a small signature of IRGs consisting of MMP9 and SDC2, which can predict the differentiation of thyroid carcinoma.
In this study, we initially focused on the investigation of IRGs in ATCs in comparison with those in normal tissues and PTCs. Through joint analysis with the TCGA cohort, two genes, MMP9 and SDC2, were screened out. MMP9 is a well-known oncogenic gene, and the enzyme encoded by this gene is involved in multiple processes in tumor progression, such as cancer proliferation, angiogenesis and metastasis, in breast cancer, glioma and colorectal cancer [32][33][34]. Moreover, MMP9 has been reported to serve as an important in ammatory mediator in angiocardiopathy, infectious diseases and cancer [35][36][37][38]. Syndecan-2 protein, encoded by the SDC2 gene, is a type I transmembrane proteoglycan. The role of SDC2 depends on the type of tumor: it acts as an oncogene and contributes to disease progression in colon cancer, breast cancer and melanoma, while it serves as a metastasis suppressor in Lewis lung carcinoma [39][40][41]. Moreover, SDC2 has been found to be positively associated with the differentiation level and prognosis of neuroendocrine tumors, which is consistent with our ndings [42].
By constructing a risk model after univariate and multivariate Cox analysis, a signi cantly elevated risk score was observed in ATCs and low-differentiated PTCs. Notably, by performing ROC analysis, we demonstrated that the risk score could well distinguish the differentiation level in both the combined GEO cohort and the TCGA cohort. By applying Kaplan-Meier curves, we found that patients with high risk scores displayed a worse PFI than those with low risk scores. These ndings suggest that the risk score could serve as an indicator for dedifferentiation, which contributes to the risk strati cation and prediction of response to radioiodine therapy of thyroid carcinoma and acts as a powerful biomarker for prognostic prediction.
Currently, immune checkpoint inhibitors, such as anti-PD-L1 and anti-CTLA-4 antibodies, display durable responses for some solid tumors, including ATCs [12][13][14][15]. In this study, we discovered a positive correlation between the risk score and the expression of important immune checkpoints or the in ltration of multiple immune cell types. Interestingly, by analyzing two cohorts of patients with metastatic urothelial cancer who received anti-PD-L1 treatment (IMvigor210) and a mouse model treated with anti-CTLA-4 agent (GSE63557) [19,20], we surprisingly observed that samples with higher risk scores tended to have a better response to immune checkpoint therapy than samples with low risk scores, regardless of the weak statistical signi cance. This nding indicates that the risk score may not only contribute to the judgment of differentiation and prognosis of thyroid cancer but could also play a role in predicting the in ltration of immune cells in the TME and the response to immunotherapy.
However, the current study does have some limitations and drawbacks. First, due to the lack of survival information of ATC patients in GEO cohorts, the two risk-related genes were screened just from low-differentiated PTCs in the TCGA cohort; thus, these results cannot totally mirror the results of ATCs. However, the risk scores and associations with TDS or immune checkpoints were veri ed in both cohorts. Second, the sample size in our study, especially for ATCs, was small; thus, the results obtained warrant further investigation. Third, the mechanism of the immune-related risk score in dedifferentiation is still undiscovered. Next, we intend to carry out in vitro and in vivo experiments to verify our conclusions and further explore the mechanism of dedifferentiation mediated by the identi ed risk-related genes. To date, multiple antineoplastic drugs have been tried in ATCs [2], but there is no report on the e cacy of MMP9 inhibitors. Based on our ndings, we speculate that MMP9 inhibitors alone or in combination with immunotherapy may be effective in the management of ATCs, but this hypothesis requires further veri cation.

Conclusions
Our ndings indicate the potential value of an immune-related risk score composed of MMP9 and SDC2 for predicting dedifferentiation and survival in thyroid carcinoma. It may also help to predict the in ltration of immune cells and the response to immunotherapy. Thus, the risk score could be used as a powerful biomarker of differentiation and prognosis for patients with thyroid carcinoma. Additionally, the included risk-related genes could be further explored as new therapeutic targets for the treatment of ATCs.  Variables with statistical p < 0.05 are in bold.