An epithelial-mesenchymal transition-related 5-gene signature predicting the prognosis of hepatocellular carcinoma patients

Tumor metastasis is one of the leading reasons of the dismal prognosis of hepatocellular carcinoma (HCC). Epithelial-mesenchymal transition (EMT) is closely associated with tumor metastasis including HCC. The purpose of this study is to construct and validate an EMT-related gene signature for predicting the prognosis of HCC patients. Gene expression data of HCC patients was downloaded from The Cancer Genome Atlas (TCGA) database. Gene set enrichment analysis (GSEA) was performed to found the EMT-related gene sets which were obviously distinct between normal samples and paired HCC samples. Cox regression analysis was used to develop an EMT-related prognostic signature, and the performance of the signature was evaluated by Kaplan–Meier curves and time-dependent receiver operating characteristic (ROC) curves. A nomogram incorporating the independent predictors was established. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to detect the expression levels of the hub genes in HCC cell lines, and the role of PDCD6 in the metastasis of HCC was determined by functional experiments. An EMT-related 5-gene signature (PDCD6, TCOF1, TRIM28, EZH2 and FAM83D) was constructed using univariate and multivariate Cox regression analysis. Based on the signature, the HCC patients were classified into high- and low-risk groups, and patients in high-risk group had a poor prognosis. Time-dependent ROC and Cox regression analyses suggested that the signature could predict HCC prognosis exactly and independently. The predictive capacity of the signature was also validated in two external cohorts. GSEA results showed that many cancer-related signaling pathways such as PI3K/Akt/mTOR pathway and TGF-β/SMAD pathway were enriched in high-risk group. The result of qRT-PCR revealed that PDCD6, TCOF1 and FAM83D were highly expressed in HCC cancer cells. Among them, PDCD6 were found to promote cell migration and invasion. The EMT-related 5-gene signature can serve as a promising prognostic biomarker for HCC patients and may provide a novel mechanism of HCC metastasis.

the most prevalent type of primary liver cancer, typically develops in the setting of cirrhosis, with chronic hepatitis B, chronic hepatitis C, alcohol abuse and non-alcoholic fatty liver disease [2]. Although many advanced management strategies have presented considerable effects on diagnosis of HCC, the overall survival (OS) of patients with HCC remains poor on account of the high metastasis rate [3]. Besides, approximately 70% of HCC patients receiving surgical resection or ablation have tumor recurrence within 5 years, which severely affects the prognosis of patients [4]. Therefore, identification of novel prognostic biomarkers is urgently needed for predicting clinical outcome of HCC patients and providing decision for surgery, pharmacological intervention and conservative treatments.
Epithelial-mesenchymal transition (EMT) is a process that epithelial cells lose polarized organization and obtain the abilities of migratory and invasive [5]. About 80% of human malignancies originate from epithelial tissues including lung, breast, colon, liver, etc. [6]. Neoplastic cells of these tumors at early stage retain the characteristics of epithelial cells such as lacking motility and failing to form continuous cell sheets. However, in advanced carcinoma, tumor cells present mesenchymal properties with highly aggressive, which are associated with metastatic dissemination [7,8]. Moreover, these mesenchymal features can reverse through mesenchymal-epithelial transition process, which make tumor cells restore the epithelial phenotype and enable them relocate into the metastasis sites [9]. Besides, EMT can also play a pivotal role in increasing the frequency of cancer stem cells which have capability to self-renew and produce more differentiated derivatives [10]. Therefore, EMT programs are closely associated with the development and progression of various malignancies including hepatocellular carcinoma. Recent study has reported that Thioredoxin domain-containing protein 12 promotes EMT through interaction with β-catenin and transcriptional activation of ZEB1, which contributes to HCC metastasis and indicates an adverse clinical outcome [11]. Currently, with the easy accessibility of many public databases, accumulating signatures or biomarkers has been developed to predict the prognosis of patients, whereas the EMTrelated prognostic model for HCC patients is still lacking.
In the present study, we constructed an EMT-related 5-gene signature using The Cancer Genome Atlas (TCGA) database to predict the prognosis of patients with HCC. The predictive effectiveness and accuracy of the signature was validated using the International Cancer Genome Consortium (ICGC) database and one microarray dataset (GSE76427) from the Gene Expression Omnibus (GEO). Next, based on the 5-gene signature and American Joint Committee on Cancer (AJCC) stage, we established a nomogram to evaluate clinical significance. Then, through gene set enrichment analysis (GSEA), we found that HCC patients with high risk score are more liable to progression and metastasis, which is in line with the characteristics of EMT. Finally, we detected the expression of the five genes and further explored the function of PDCD6 in HCC cells, and found that PDCD6 played a vital role in HCC metastasis.

Data acquisition
Raw gene expression data of HCC (n = 374) and normal samples (n = 50), and clinical information of the corresponding patients were obtained from the TCGA data portal (http://porta l.gdc.cance r.gov/proje cts). Stemness scores based on mRNA (RNAss) of TCGA pan-cancer data was downloaded from xena browser (https ://xenab rowse r.net/datap ages/). For clinical information, samples with any missing data or a follow-up time less than one month were excluded, and finally a total of 319 HCC samples were applied for subsequent study.
For two validation cohorts, LIRI-JP cohort was downloaded from ICGC portal (http://dcc.icgc.org/relea se/ curre nt/Proje cts) and a microarray dataset (GSE76427) was downloaded from the GEO database (https ://www. ncbi.nlm.nih.gov/geo/). According to the same screening criteria of TCGA cohort, a total of 232 samples from ICGC cohort and 94 samples from GSE76427 were selected for external validation. Detailed information of TCGA cohort and two validation cohorts were shown in Table 1. Due to all data were downloaded from the public database, it was not needed to acquire additional ethical approval for this study.

Construction and validation of the prognostic EMT-related gene signature
The 152 EMT-related genes of the 50 normal samples and paired tumor were comparatively analyzed to identify the significantly up-regulated genes in tumor samples (log2 fold change > 1 and FDR < 0.05). The 319-patient TCGA cohort was used as discovery cohort to construct a prognostic model. Univariate Cox regression analysis was conduct to identify up-regulated genes that were related to survival, and genes with P-value < 0.05 were used for subsequent multivariate Cox regression analysis. Finally, an EMT-related 5-gene signature was developed for prognosis prediction, and the risk score of each patient was calculated as the following formula: Where n is a number of prognostic genes, and Coef i indicates the estimated regression coefficient of the gene, and Exp i indicates the expression level of the gene. Patients were divided into high-risk or low-risk group based on the medium value of risk score.
To verify the predictive capacity of 5-gene signature, the 232 HCC patients from the ICGC database and 94 HCC patients from the GSE76427 were served as validation cohorts. With the medium score as the cut-off value, patients were also classified into high-or lowrisk group for survival analysis.

Development and assessment of a predictive nomogram
The nomogram containing independent prognostic factors was plotted utlizing the "rms" package in R software. The total score of each patient could be calculated using the nomogram, which could be applied to predict the 1-, 3-and 5-year survival rate of HCC patients. Calibration curves were utilized to evaluate the performance of the nomogram. In the calibration graph, nomogram-predicted survival and the observed clinical outcome are plotted on the x-axis and y-axis, respectively, and the 45° dotted line represents the optimal prediction.

RNA extraction and quantitative real-time polymerase chain reaction (qRT-PCR)
Total RNA was extracted from cell lines using Cell Total RNA Isolation kit (Foregene, Chengdu, China) following the manufacturer's instruction. RNA (1 μg) was reverse-transcribed to complementary DNA (cDNA) according to the protocol of the PrimeScript RT reagent kit (TaKaRa, Osaka, Japan). qRT-PCR was carried out on the iQ5 Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) using the SYBR Green qPCR Supermixes (Bio-Rad). Primer sequences were listed in Table 2. Cycle threshold (Ct) of each gene was standardized to GAPDH using the 2 −ΔΔCT method. Each experiment was conducted in triplicate.

Wound healing assay
Cells were planted into 6-well plates for 24 h to reach 80-90% confluence. Lines were scratched across cells with a sterile pipette tip. The cells were washed by phosphate buffer saline (PBS) and were cultured in serumfree medium. After 0, 24 and 48 h of wounding, images were captured to measure the area of cell migration. The relative wound closure = (the gap area of the wound at 0 h − the gap area of the wound at 24 h or 48 h in si-PDCD6 group)/(the gap area of the wound at 0 h − the gap area of the wound at 24 h or 48 h in NC group). The experiment was conducted in triplicate.

Transwell invasion assay
The transwell chambers (8 μm pore size, Corning, NY, USA) coated with Matrigel (BD Biosciences, NJ, USA) were placed into a 24-well plate. A total of 1 × 10 5 cells/ well were seeded into the upper chamber containing 200 μl of serum-free medium, whereas 600 μl medium supplemented with 10% FBS was added to the lower chamber. Following incubation for 36-48 h (HepG2, 36 h; Hep3B, 48 h) at 37 °C incubator containing 5% CO 2, cells inside the chamber were wiped off. Then, cells on the bottom of the membrane were fixed in 4% paraformaldehyde and stained with 0.1% crystal violet. Images of 3 random fields in each chamber were captured under a microscope and the number of cells was counted.

Statistical analysis
Statistical analyses were performed with Rstudio (version 1.3.1093), SPSS 22.0 software (Armonk, NY), and Graph-Pad Prism 5.0 software (San Diego, CA, USA). Kaplan-Meier survival curve combined with log-rank test was employed to evaluate survival differences between highand low-risk groups. Univariate and multivariate Cox proportional hazard regression model were conducted  to analyze the hazard ratio of prognostic factors and to identify independent prognostic factors. The predictive accuracy of the prognostic model was reflected by timedependent receiver operating characteristic (ROC) curve and determined by area under curve (AUC). Associations between the EMT-related gene signature risk level and clinicopathological characteristics were analyzed by chisquare test. Comparisons between two groups were performed using two tailed Student's t test or paired t test.
Correlations among genes in EMT-related signature were determined by the Spearman rank correlation test. All results are presented as the mean ± standard deviation (SD). A P < 0.05 was considered statistically significant.

Identification of EMT-related genes in HCC
A detailed flow chart illustrating the procedure of our study is depicted in Fig. 1. GSEA was utilized to identify whether two EMT-related gene sets were strikingly distinct between normal samples and paired HCC samples. As the result shown, only GO_EPITHELIAL_TO_MES-ENCHYMAL_TRANSITION gene set was obviously enriched in HCC samples (Fig. 2a). Then, through differentially expressed gene analysis, we found that 42 EMT-related genes from GO_EPITHELIAL_TO_MES-ENCHYMAL_TRANSITION gene set were significantly up-regulated in HCC samples, which were selected for further analysis (Fig. 2b).

Development of the prognostic EMT-related gene signature
By inputting the 42 up-regulated genes identified above into the TCGA discovery cohort, a total of 8 EMT-related genes were shown to be significantly associated with OS using univariate Cox regression analysis. Then, multivariate Cox regression method was utilized for further screening. Finally, 5 genes, PDCD6, TCOF1, TRIM28, EZH2 and FAM83D, were identified as predictive biomarkers of prognosis. A prognostic model based on the 5 EMTrelated genes was established to calculate the risk score of each patient as follow: risk score = (0.0495 × expres- According to the median value of the signature, patients in TCGA cohort were divided into high-risk and low-risk groups. The rise in risk score was paralleled by increases in the expression of the 5 EMT-related genes and in the mortality of HCC patients (Fig. 3a). Likewise, the survival results showed that patients with a high-risk score had worse OS [hazard ratio (HR) = 2.441, 95% confidence interval (CI) 1.654-3.603, P < 0.001] and diseasespecific survival (DSS) (HR = 2.823, 95% CI 1.731-4.603, P < 0.001) than those with a low-risk score (Fig. 3b, c). Moreover, the time-dependent ROC curves demonstrated that the AUCs for 1-year, 2-year, and 3-year OS were 0.803, 0.721, and 0.7, which suggested favorable accuracy of the EMT-related signature in predicting the prognosis of HCC patients.

EMT-related gene signature as an independent indicator in predicting survival
Preliminarily, a cluster heat map of the 5 EMT-related genes was developed to present the association between the signature and clinicopathological characteristics. We found that the risk level was positively related to tumor grade and AJCC stage ( Fig. 4a and Additional file 1). To determine whether the EMT-related 5-gene signature could serve as an independent prognostic factor in HCC, we employed univariate and multivariate Cox regression analyses to compare the prognostic value of the signature with several clinicopathological factors including age, gender, tumor grade and AJCC stage. The results indicated that the EMT-related 5-gene signature (HR = 1.243, 95% CI 1.170-1.321, P < 0.001) and AJCC stage (HR = 2.450, 95% CI 1.642-3.654, P < 0.001) were the independent predictors of HCC (Fig. 4b).

Validation of the EMT-related gene signature
To validate the predictive capacity of the EMT-related 5-gene signature, two datasets (ICGC and GSE76427) were utilized as external validation cohorts. Patients in validation cohorts were also classified into high-and low-risk groups according to median value. In line with the performance in TCGA discovery cohort, we found that patients with high-risk score in ICGC and GSE76427 cohorts had shorter OS than those with low-risk score (HR = 3.340, 95% CI 1.824-6.117, P < 0.001; HR = 2.605, 95% CI 1.015-6.683, P = 0.046) (Fig. 6a, b). In addition, the AUCs for the 1-, 2-and 3-year OS were 0.739, 0.704 and 0.754 in ICGC cohort (Fig. 6c). Similarly, through Cox regression analyses, the EMT-related 5-gene signature could also act as an independent prognostic factor in ICGC cohort (HR = 1.320, 95% CI 1.162-1.500, P < 0.001) (Fig. 6d).

Establishment and assessment of a predictive nomogram
The independent prognostic factors of TCGA discovery cohort, namely the EMT-related 5-gene signature and AJCC stage, were utilized to build a nomogram to provide clinicians with a quantitative method to predict the probability of 1-, 3-and 5-year OS in HCC patients. The point of each factor indicating on the top scaleplate was added up to get a total point which can correspond to the 1-, 3-and 5-year survival rates in the below scaleplates (Fig. 7a). Additionally, calibration plots showed that in comparison to the ideal model, the nomogram presented good performance (Fig. 7b). Besides, the AUCs of nomogram at 1-, 3-and 5-year OS were 0.762, 0.724 and 0.676, respectively (Fig. 7c).

Identification of EMT-related gene signature associated biological pathways
GSEA was employed to explore the biological process and signaling pathway underlying the EMT-related 5-gene signature. The result demonstrated that the biological process of cell cycle and glycolysis and several classical cancer-related pathway including PI3K/Akt/ mTOR pathway, p53 pathway, Notch pathway, WNT/βcatenin pathway, TGF-β/SMAD pathway and MAPK pathway were significantly enriched in the high-risk group (Fig. 8a-e). Moreover, we found that the highrisk group had a higher stemness score based on mRNA expression which could reflect the tumor stemness than the low-risk group (Fig. 8f ).

Regulatory network and alteration of the 5 hub genes and their expression levels in HCC
GeneMANIA was utilized to construct a regulatory network to identify other potential genes that interact with the 5 EMT-related genes. The network contained 5 hub genes and 20 other genes (Fig. 9a), and their connections and weights were listed in Additional file 2. Then, we analyzed the correlation of the 5 hub genes in HCC and found that they were positively correlated with each other (Fig. 9b). The genetic alterations of 5 EMT-related genes were detected via cBioPortal database. The results revealed that the 5 genes were altered in 9% (32/353) of  queried samples. Among them, PDCD6 was the most frequently mutated genes (5%), followed by TRIM28 (1.7%), EZH2 (1.7%), TCOF1 (0.8%) and FAM83D (0.8%) (Fig. 9c). In addition, we also investigated the expression levels of 5 EMT-related genes in the normal samples and paired HCC samples based on TCGA cohort. As shown in Fig. 9d, the 5 genes were all highly expressed in the tumor samples. Besides, we employed qRT-PCR to detect their expression levels in normal liver cell line and three HCC cell lines, and found that PDCD6, TCOF1 and FAM83D were highly expressed in three HCC cell lines (Fig. 9e) (Additional file 3).

Knockdown of PDCD6 inhibits cell migration and invasion in HCC
Since the expression of PDCD6 changed most significantly in the three cancer cells and its function in HCC was still unclear, we selected it for further functional exploration. Two different PDCD6 siRNAs and a negative control were transfected into HepG2 and Hep3B cells. Their efficiency in knocking down PDCD6 was detected by western blot. As the result demonstrated, compared to si-PDCD6-1, si-PDCD6-2 performed better and was used for succeeding experiments (Fig. 10a). The effect of PDCD6 on cell migration was analyzed using wound healing assay. Compared to the NC group, the wound area in PDCD6-knockdown cells was much larger at 24 and 48 h after wounding, which indicated that knockdown of PDCD6 suppressed cell migration (Fig. 10b, c). Moreover, the result of matrigel invasion assay revealed that knockdown of PDCD6 significantly attenuated the invasion capacities of HepG2 and Hep3B cells (Fig. 10d, e). Besides, based on TCGA cohort, HCC patients with high PDCD6 expression had shorter OS (HR = 1.937, 95% CI 1.303-2.879, P = 0.0011) and DSS (HR = 2.262, 95% CI 1.371-3.731, P = 0.0014) than those with low PDCD6 expression (Fig. 10f, g).

Discussion
HCC is a fatal disease, and patients with HCC have dismal 5-year survival rate. Metastasis and recurrence are the leading reason of the poor prognosis of HCC patients Fig. 9 Analysis of the 5 EMT-related genes. a GeneMANIA establish a protein-protein interaction network based on these 5 genes. b Correlation between the 5 genes in TCGA dataset. c The proportion of genetic alteration of the 5 genes in 353 HCC samples from the cBioPortal database. d Expression of the 5 genes in the normal samples (n = 50) and paired HCC samples (n = 50) from TCGA dataset. e The qRT-PCR analysis of the expression levels of 5 genes in HL-7702, SK-Hep-1, HepG2 and Hep3B cell lines. Data are presented as the mean ± SD, n = 3, *P < 0.05, **P < 0.01 and ***P < 0.001 [1]. Previously, EMT has been identified as a prometastatic cellular event that promotes tumor cell invasion and metastasis [15]. Similarly, EMT also plays a critical role in HCC metastasis [11,16]. Therefore, the EMT-related genes may have great potential to become biomarkers of cancer progression and to predict the prognosis of HCC patients. In this study, an EMT-related 5-gene signature was constructed and presented a favorable performance in predicting the survival of patients with HCC. In addition, we noted that patients in high-risk group were involved in cancer-related pathways such as PI3K/Akt/mTOR pathway and TGF-β/SMAD pathway, which implied a higher risk of progression and metastasis. Moreover, we found that patients with high risk scores were positively associated with tumor stemness, which was in agreement with the conclusion that EMT could make tumor cells have Data are presented as the mean ± SD, n = 3, *P < 0.05, **P < 0.01 and ***P < 0.001. d, e Effects of PDCD6 on cell invasion evaluated by transwell assay. Data are presented as the mean ± SD, n = 3, *P < 0.05 and **P < 0.01. f Kaplan-Meier curve of OS between high PDCD6 expression group and low PDCD6 expression group. g Kaplan-Meier curve of DSS between high PDCD6 expression group and low PDCD6 expression group the characteristics of stem cells, and this feature also indicated that patients would have a poor prognosis [10].
The EMT-related 5-gene signature was consisted of TRIM28, EZH2, FAM83D, TCOF1 and PDCD6. TRIM28, a member of the tripartite motif-containing proteins (TRIM) family, is reported to act as an E3 ubiquitin ligase which can form MAGE-C2-TRIM28 and MAGE-A3/6-TRIM28 E3 ligase complexes in cancer to respectively target p53 and AMPK for degradation in a proteasomedependent way [17,18]. In HCC, MAGE-C2-TRIM28 complex was found to degrade FBP1, thereby enhancing Warburg effect and promoting cell growth in HCC [19]. EZH2 is a catalytic subunit of polycomb repressive complex 2 (PRC2) and exerts its functions via three kinds of mechanism including PRC2-dependent Lys-27 in histone 3 (H3K27me3) methylation, PRC2-dependent non-histone protein methylation, and PRC2-independent gene transactivation [20]. EZH2 was proved to be closely related to cancer initiation, progression, metastasis, and drug resistance [21,22]. Previous study found that EHZ2 could be silenced by miR-101, which sensitized HCC cells to chemotherapeutic therapy [23]. FAM83D is found to be overexpressed in various type of cancers such as non-small-lung cancer and colorectal cancer, and also plays a vital role in tumor initiation and progression [24,25]. Likewise, in HCC, FAM83D promotes cell proliferation and accelerates the G1 to S transition through activating MEK/ERK signaling pathway, which enhances the malignant behavior of tumor [26]. Although the function of TCOF1 is rarely reported in carcinoma, its expression level is high in HCC tissues and HCC cell lines according to our finding, which may provide foundation for the further investigation of the association between TCOF1 and tumorigenesis.
Programmed cell death protein 6 (PDCD6), which is also known as apoptosis-linked gene-2 (ALG-2), was reported to encode a calcium-binding protein and exert its physiological functions via Ca 2+ -dependent interaction with its downstream proteins [27]. PDCD6 has been involved in various physiological processes including signal transduction, cell death and cell division [28]. Meanwhile, PDCD6 is also implicated in the development of cancer, but its role is controversial in different types of tumors. Overexpression of PDCD6 was proved to promote the progression of colorectal cancer via cooperating with C-Raf and activating Raf/MEK/ERK signaling pathway [29]. Similarly, PDCD6 was overexpressed in metastatic ovarian cancer cells and was able to promote cell migration and invasion [30]. Whereas, low expression levels of PDCD6 was found to be associated with a poor prognosis in gastric cancer [31]. In HCC, previous study has shown that PDCD6 is highly expressed in cancerous tissues, but its function is still unclear [32]. Our study found that the expression level of PDCD6 was upregulated in three liver cancer cells compared to normal liver cell. Then, we are the first to explore the function of PDCD6 in HCC, and reveal that knockdown of PDCD6 inhibits cell migration and invasion in HepG2 and Hep3B cells. Furthermore, HCC patients with high PDCD6 expression predicted a dismal prognosis. Therefore, PDCD6 may play a vital role in HCC metastasis and could be an underlying prognostic biomarker and therapeutic target for HCC.
Previous studies have established various gene signatures to predict the prognosis of patients with HCC such as metabolism-related gene signature, glycolysisrelated gene signature and so on [33,34]. Compared with these signatures, the EMT-related 5-gene signature we constructed has higher AUC values for OS, which suggests more excellent performance of our model in predicting prognosis of HCC patients. Nevertheless, our study also has some limitations. Firstly, this prognostic signature is developed using retrospective analysis based on public datasets, thus, the results need to be further confirmed in prospective trails. Secondly, the sample capacity of our validation cohorts is insufficient, especially GSE76427, which affects the accuracy of model prediction to some extent. Finally, the underlying mechanism of PDCD6 promoting HCC cells migration and invasion needs further exploration.

Conclusion
In summary, we established a novel EMT-related 5-gene signature associated with the prognosis of HCC patients. Validation in two external cohorts (ICGC and GSE76427) further verified the prognostic value of the signature. Moreover, the signature was identified as an independent factor of HCC, and then constructed a nomogram combining with another independent factor to effectively predict the prognosis of HCC patients. Furthermore, knockdown of PDCD6, one of the 5 EMT-related genes, inhibits HCC cells migration and invasion. These results indicate that the EMT-related 5-gene signature can serve as a novel prognostic biomarker for HCC patients and may provide a new mechanism of HCC metastasis.