A robust twelve-gene signature for prognosis prediction of hepatocellular carcinoma

Background The prognosis of hepatocellular carcinoma (HCC) patients remains poor. Identifying prognostic markers to stratify HCC patients might help to improve their outcomes. Methods Six gene expression profiles (GSE121248, GSE84402, GSE65372, GSE51401, GSE45267 and GSE14520) were obtained for differentially expressed genes (DEGs) analysis between HCC tissues and non-tumor tissues. To identify the prognostic genes and establish risk score model, univariable Cox regression survival analysis and Lasso-penalized Cox regression analysis were performed based on the integrated DEGs by robust rank aggregation method. Then Kaplan–Meier and time-dependent receiver operating characteristic (ROC) curves were generated to validate the prognostic performance of risk score in training datasets and validation datasets. Multivariable Cox regression analysis was used to identify independent prognostic factors in liver cancer. A prognostic nomogram was constructed based on The Cancer Genome Atlas (TCGA) dataset. Finally, the correlation between DNA methylation and prognosis-related genes was analyzed. Results A twelve-gene signature including SPP1, KIF20A, HMMR, TPX2, LAPTM4B, TTK, MAGEA6, ANX10, LECT2, CYP2C9, RDH16 and LCAT was identified, and risk score was calculated by corresponding coefficients. The risk score model showed a strong diagnosis performance to distinguish HCC from normal samples. The HCC patients were stratified into high-risk and low-risk group based on the cutoff value of risk score. The Kaplan–Meier survival curves revealed significantly favorable overall survival in groups with lower risk score (P < 0.0001). Time-dependent ROC analysis showed well prognostic performance of the twelve-gene signature, which was comparable or superior to AJCC stage at predicting 1-, 3-, and 5-year overall survival. In addition, the twelve-gene signature was independent with other clinical factors and performed better in predicting overall survival after combining with age and AJCC stage by nomogram. Moreover, most of the prognostic twelve genes were negatively correlated with DNA methylation in HCC tissues, which SPP1 and LCAT were identified as the DNA methylation-driven genes. Conclusions We identified a twelve-gene signature as a robust marker with great potential for clinical application in risk stratification and overall survival prediction in HCC patients.

approximately 841,000 new cases of HCC and 782,000 deaths in 2018. The incidence and mortality continue to increase for both sexes. Surgical resection is the most effective treatment to cure early stage HCC, but most patients are first diagnosed at an advanced stage which missed the best time for surgical treatment. Although chemotherapy, radiotherapy, liver transplantation and other potentially curative treatment have achieved a variety of therapeutic effects and prolonged survival period, the prognosis of HCC remains poor due to the high rate of recurrence and intrahepatic spread [2]. Those highrisk HCC patients with potentially poor outcomes must be monitored and adopt timely and effective treatments to prolong survival and improve quality of life [3]. Therefore, it is an urgent need for effective prediction markers to accurately assess the prognosis of HCC patients.
Prognostic models based on parameters including clinical baseline characteristics to molecular biomarkers for HCC have been constructed in several previous studies [4]. A prognostic score by using positive tumor markers (alpha-fetoprotein (AFP), fucosylated AFP and des-gamma-carboxy prothrombin) showed a useful predictive prognostic value in HCC patients treated with transcatheter arterial chemoembolization [5]. But the characteristic of tumor markers, depended on tumor burden, limits their value in diagnosing early stage tumors. With the developments in gene chips and highthroughput sequencing, gene signature based on mRNA expression levels have shown great potential in predicting HCC prognosis. The abnormal expression levels of single gene such as SEC62 [6], SHP-1 [7], RING1 [8], AGBL2 [9] have been reported to be independent prognostic factors for HCC patients. Moreover, a risk-coefficient model based on a multigene mRNA expression signature has been identified to be an independent prognostic factor for overall survival (OS) and could stratify patients into high-and low-risk group with significantly different OS [10][11][12]. These gene signatures could be used for the preclinical and clinical treatment for HCC patients. However, additional gene signatures are needed for accurate prognosis of HCC because of complexity and heterogeneity of this disease.
In the current study, six sets of differentially expressed genes (DEGs) from different Gene Expression Omnibus (GEO) datasets were integrated to identify overlapping DEGs. Functional annotation assessment with Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted with the overlapping DEGs. Univariable and Lasso-Cox regression analysis were applied to identify overall survival-related DEGs and propose a prognostic risk score model to stratify HCC patients. Besides, independent prognostic factors of OS were identified by multivariable Cox survival analysis. We finally identified twelve-gene signature as a robust marker with great potential in risk stratification and OS prediction in HCC patients.

Study population
In present study, we searched and downloaded mRNA expression chip data of HCC tissues from the GEO database by using the keywords of "hepatocellular carcinoma" and "Homo sapiens". Six microarray datasets (GSE121248, GSE84402, GSE65372, GSE51401, GSE45267 and GSE14520 (based on the GPL571 platform) were obtained for DEGs analysis. Details of the GEO datasets used in this study are shown in Table 1. RNA-sequencing data of 371 HCC tissues and 50 normal tissues normalized by log2 transformation were acquired from The Cancer Genome Atlas (TCGA) for analyzing the integrated DEGs from the six GEO datasets and building gene prognostic models. GSE14520 datasets (based on the GPL3921 platform) included 216 HCC tissues with complete clinical information and mRNA expression data for external validation of the prognostic gene signature. After excluding TCGA cases with incomplete clinical information, 233 HCC patients along with  [13].

Processing of gene expression data
To integrated gene expression chip data downloaded from the GEO datasets, we firstly conducted background correction, quartile normalization for the raw data followed by log2 transformation to obtain normally distributed expression values. The DEGs between HCC tissues and non-tumor tissues were identified using the "Limma" package in R [14]. The thresholds of absolute value of the log2 fold change (logFC) > 1 and adjusted P value < 0.05 were adopted. Mean expression values were applied for genes with multiprobes. Then, we used the robust rank aggregation (RRA) method to finally identify overlapping DEGs (P < 0.05) from the six GEO datasets.

Construction of a potential prognostic signature
To identify the prognostic genes, we firstly sifted 341 patients from the TCGA Liver Hepatocellular Carcinoma (TCGA-LIHC) cohort with follow-up times of more than 30 days. Then, univariable Cox regression survival analysis was performed based on the overlapping DEGs. A value of P < 0.01 in the univariable Cox regression analysis was considered statistically significant. Subsequently, the prognostic gene signature was constructed by Lassopenalized Cox regression analysis [15], and the optimal values of the penalty parameter alpha were determined through 10-times cross-validations by using R package "glmnet" [16]. Based on the optimal alpha value, a twelvegene prognostic signature with corresponding coefficients was selected, and a risk score was calculated for each TCGA-LIHC patient. Next, the HCC patients were divided into two or three groups based on the optimal cutoff of the risk score determined by "survminer" package in R and X-Tile software. To assess the performance of the twelve-gene prognostic signature, the Kaplan-Meier estimator curves and the C-index comparing the predicted and observed OS were calculated using package "survival" in R. Time-dependent receiver operating characteristic (ROC) curve analysis was also conducted by using the R packages "pROC" [17] and "survivalROC" [18]. Then, the GSE14520 datasets with complete clinical information was used to validate the prognostic performance of twelve-gene signature. The GSE14520 external validation datasets was based on the GPL3921 platform of the Affymetrix HT Human Genome U133A Array Plate Set (HT_HG-U133A, Affymetrix, Santa Clara, CA, United States).

Independence of the prognostic gene signature from other clinical parameters in TCGA
The risk score and other clinical variants, including age, body mass index (BMI), sex, tumor grade, the AJCC pathologic tumor stage, vascular invasion, residual tumor status and AFP value, were analyzed by univariable Cox regression analysis. Next, we conducted a multivariable Cox regression model that combined the risk score and the above clinical indicators (P value < 0.2) to assess the predictive performance. The univariable and multivariable Cox regression analysis were performed with TCGA-LIHC patients (n = 234) that had complete clinical information.

Building and validating a predictive nomogram
A composite nomogram [19] was constructed based on all independent prognostic parameters previous screened to predict the probability of 1-year, 3-year and 5-year OS using the "rms" package in R software. The time-dependent area under the ROC curve (AUC) was calculated to determine the discriminatory ability of the above prognostic parameters. Then we used a calibration curve to visualize the performance of the nomogram with the observed rates at corresponding time points. Based on the total number of points of the nomogram, the patients were also stratified into two or three groups according to the optimal cutoffs. The Kaplan-Meier survival curves for the different groups were then plotted.

Functional enrichment analysis of DEGs and prognosis-related genes
To detect potential biological functions and involved signaling pathways of DEGs, GO and KEGG enrichment analyses were performed by DAVID. Only P value < 0.05 was considered statistically significant. In addition, the protein-protein interaction (PPI) network was applied to explore potential interactions mRNAs via the STRING database and was visualized via Cytoscape v.3.7.1.

The correlation between DNA methylation and prognosis-related genes
The DNA methylation data of 371 HCC tissues and 50 normal tissues were acquired from TCGA database. The average DNA methylation beta-value for all CpG sites of a gene was calculated as the DNA methylation value for that gene. To examine the relationship between DNA methylation level and the corresponding mRNA expression value, the "MethylMix" package in R was utilized [20]. MethylMix identifies differential and functional DNA methylation by using a beta mixture model to identify tumor samples with different DNA methylation compared to normal tissue. Functional DNA methylation refers to a significant negative correlation between methylation and gene expression of a particular gene.

Statistical analysis
R software 3.5.0 was used for all statistical analyses. Categorical variables were analyzed by the χ 2 or Fisher's exact test. Continuous variables were analyzed using Student's t-test for paired samples. Two groups of boxplots were analyzed using the Wilcoxon-test. The heatmap of the DEGs was visualized using the "pheatmap" [21] package in R with zero-mean normalization. Kaplan-Meier survival curves were built and hazard ratio (HR) with 95% confidence interval (CI) were calculated by log-rank tests.
Correlations among the individual genes in the signature were assessed by Pearson correlation coefficients. All statistical tests were two-sided, P < 0.05 was considered statistically significant.

Identification of DEGs
The overall data processing workflow is shown in Fig. 1. Before identifying the DEGs, the six microarray datasets were normalized ( Fig. 2a; Additional file 1: Figure  S1A-F). As previously mentioned, details of the GEO datasets used in this study were presented in Table 1.
Hierarchical clustering analysis demonstrated differences in DEGs expression patterns between tumor and nontumor tissues ( Fig. 2b; Additional file 1: Figure S1A-F). Volcano plots were generated to show the distribution of the DEGs ( Fig. 2c; Additional file 1: Figure S1A-F). Next, we integrated the six sets of DEGs between tumor and non-tumor tissues by RRA method. Finally, a total of 175 overlapping DEGs, including 55 upregulated and 120 downregulated genes, were identified (Additional file 2: Table S1). The top 20 overlapping upregulated and downregulated DEGs in the six datasets are showed in Fig. 2d.

Functional enrichment and PPI network analysis of the DEGs
To further analyze biological information from the upregulated and downregulated overlapping DEGs (logFC > 1, P value < 0.05), GO enrichment analysis was performed respectively through the online DAVID tool. Concerning biological processes, the downregulated DEGs were significantly enriched in oxidation-reduction process, immune response and proteolysis. The upregulated DEGs were significantly enriched in cell division, mitotic nuclear division and G1/S transition of mitotic cell cycle. Enrichment analysis of cellular compartment  Fig. 3. KEGG pathway analysis showed that the DEGs were mainly enriched in metabolic pathways, chemical carcinogenesis, biosynthesis of antibiotics, retinol metabolism and cell cycle (Additional file 3: Figure S2A, Additional file 4: Table S2). The PPI network of the DEGs was constructed by using the STRING database. A total of 400 nodes and 4238 edges were obtained and analyzed in the PPI network. Then, the top 25 candidate hub genes were identified with the cytoHubba plugin by using MCC method from Cytoscape ( Figure S2B). Module analysis identified significant clustering modules in the PPI network. As illustrated in Figure S3A-C, the three highestscoring functional clusters of modules were selected by the MCODE app (module 1, MCODE score = 70.421; module 2, MCODE score = 11.12; module 3, MCODE score = 6.121). KEGG pathways of each module were determined and are showed in Additional file 5: Figure  S3.

Identification of survival-related DEGs and establishment of the twelve-gene prognostic signature
A total of 341 TCGA-LIHC samples with a follow-up period > 30 days were included from 365 TCGA-LIHC patients for subsequent survival analyses. The baseline characteristics of these patients were listed in Table 2.
Based on the univariable Cox regression model, 80 of total 175 DEGs were considered to be significantly correlated with OS, including 35 upregulated DEGs and 45 downregulated DEGs (P < 0.01). Then, Lasso penalized Cox regression analysis was performed, and a prognostic twelve-gene signature was identified, consisting of  Figure S4, Additional file 7: Table S3).
To validate different expressions of the twelve genes between tumor and non-tumor tissue, 369 HCC tissues and 160 normal tissues were compared using Gene Expression Profiling Interactive Analysis (GEPIA, http:// gepia .cance r-pku.cn/). The mRNA expression levels of SPP1, KIF20A, HMMR, LAPTM4B and TPX2 were significantly increased in HCC tissues, while the levels of ANX10, CYP2C9, LCAT and RDH16 were significantly decreased (Fig. 4a). ROC analysis showed well prognostic performance of the twelve-gene signature (Fig. 4b).
The Kaplan-Meier survival curves revealed that the upregulated gene with lower expression levels have better survival period, while the downregulated genes were positive correlation with survival period (P value < 0.05; Additional file 8: Figure S5). Besides, correlation analysis indicated that TPX2 and TTK had a strong positive correlation (r = 0.89, P < 0.001), and that KIF20A and ANXA10 had a moderate negative correlation (r = − 0.47, P < 0.001) ( Fig. 4c; Additional file 9: Table S4). Of the 365 TCGA-LIHC patients included in the mutation analysis, 88 (23.6%) presented with alterations in the twelve genes. Amplification was the most common type of mutation among the upregulated genes especially LAPTM4B (Fig. 4d) (Fig. 5e, f ). Time-dependent ROC curve and C-index analyses were applied to evaluate the prognostic values of the twelvegene risk scores and were then compared with those of the AJCC stage (Fig. 5g-i). The AUCs for the 1-, 3-, and 5-year OS predictions for the risk score were 0.77, 0.73, and 0.72, respectively. The AUCs for the 1-, 3-, and 5-year OS predictions for the AJCC stage were 0.63, 0.64, and 0.61, respectively. The C-index of the risk score was 0.653 (95% CI 0.606-0.700), while that of the AJCC stage was 0.591 (95% CI 0.544-0.638).

External validation of the prognostic performance of the twelve-gene signature
To validate the classification performance of the twelvegene prognostic signature with different data platforms, we used the GSE14520 cohort, which included 216 HCC patients, as an external dataset. Similarly, the patients each received exclusive risk score and were stratified into two or three groups based on cutoff value calculated by X-Tile. The Kaplan-Meier survival curves showed significant difference in the OS among the groups. Lowrisk group had obviously better outcomes than high-risk or middle-risk groups (Fig. 6e, f ). Moreover, the AUCs for the 1-, 3-, and 5-year OS predictions for the risk score were 0.63, 0.68 and 0.66, respectively, which show comparable prognostic power with the AJCC stage ( Fig. 6g-i). The C-index of the risk score was 0.614 (95% CI 0.549-0.679), while that of the AJCC stage was 0.622 (95% CI 0.573-0.671). External validation suggested that the twelve-gene signature continued to perform well at predicting OS in HCC patients.

Generation and validation of the diagnostic performance of the twelve-gene signature
As for training datasets (TCGA-LIHC) of 371 HCC samples and 50 normal samples, we established a diagnostic model with twelve genes by using the logistic regression analysis to distinguish HCC from normal tissue. Exclusive diagnostic score was calculated by corresponding coefficients as follows: 1.   (Fig. 7a, b), while 95.5% sensitivity and 90.7% specificity in the GSE14520 datasets (AUC = 0.962) of 225 HCC samples and 220 normal samples (Fig. 7d, e). Unsupervised hierarchical clustering of twelve genes could also differentiate HCC from normal tissue with high sensitivity and specificity (Fig. 7c, f ).

Correlation with clinicopathological characteristics and prognostic factor
Among the 233 patients included in TCGA-LIHC cohort with complete clinical information, a higher risk score was found to be significantly correlated with female sex, advanced tumor grade, vascular invasion and higher AFP (Table 3). Moreover, the univariable and multivariable Cox regression analyses indicated that the risk score and AJCC stage were both independent prognostic factors for OS (Table 4).

Building and validating a predictive nomogram from the TCGA-LIHC cohort
The 233 TCGA-LIHC patients with complete clinical information were adopted to build a prognostic nomogram. Risk score, age and AJCC stage were used as parameters in the nomogram (Fig. 8a). The AUCs of the 1-, 3-, and 5-year OS predictions for the nomogram were 0.76, 0.74, and 0.75, respectively (Fig. 8g-i). The C-index of the nomogram was 0.711 (95% CI 0.642-0.78), while that for the AJCC stage was 0.567 (95% CI 0.508-0.626). Thus, the nomogram was superior to the risk score or AJCC stage in predicting OS of HCC. The patients were stratified into two or three groups based on median or cutoff values generated by X-Tile according to the scoring of the nomogram. The Kaplan-Meier curves showed significant difference in the OS among groups (Fig. 8e, f ). Those with lower scores experienced significantly better survival period (P < 0.0001). Calibration plots showed that the nomogram performed well at predicting OS in HCC patients (Fig. 8d).

Validation of the DNA methylation pattern of twelve-gene signature
Based on the DNA methylation data and the paired gene expression data of twelve genes in 371 HCC tissues, functional DNA methylation analyses showed that six genes, including SPP1, RDH16, LAPTM4B, LCAT, CYP2C9 and LECT2, had a significantly strong negative correlation between with gene expression and DNA methylation, and four genes (HMMR, KIF20A, TPX2 and TTK) showed moderate or weak correlation (Fig. 9b, e, Additional file 11: Figure S7), while the methylation data involving ANXA10 and MAGEA6 were lacked. Besides, the beta mixture model had identified SPP1 and LCAT as the DNA methylation-driven genes, which the gene expression value was significantly affected by DNA methylation events. A significantly low DNA methylation were noted for SPP1 relative to high expression levels in tumor tissues, while high DNA methylation and low expression for LCAT (P < 0.0001) (Fig. 9).

Discussion
For decades, many studies have strived to elucidate the pathogenesis and epidemiology of HCC. Although great improvement on surgical and medical therapy has been made, the prognosis of HCC remains poor, which has resulted in a heavy disease burden on a global scale. Lacking of efficient detection methods on the early stage Fig. 6 External validation of the prognostic performance of the twelve-gene signature in GSE14520 dataset. a Distribution of the risk score and survival data. Alive cases showed in yellow, dead cases showed in blue. b Distribution of the risk score. c Heat map of the twelve gene expression in the GSE14520 dataset. d Calibration plot of the twelve-gene signature for predicting the probability of survival at 1-, 3-, and 5-years. e, f Kaplan-Meier survival curves of the risk score model. g-i Time-dependent ROC curve of risk score model for 1-, 3-, and 5-year overall survival predictions in compare with AJCC stage attributes to the progression of the disease. Moreover, HCC is a highly heterogeneous disease in that survival times vary significantly among patients with similar TNM stages. Consequently, screening an efficient prognostic marker that dynamically reflect the biological progression of the tumor would be important for individualized prevention and treatment of HCC.
The present study aimed to identify an effective prognostic marker to stratify HCC patients and predict the outcome of HCC. In our current study, a total of 175 overlapping DEGs between HCC tissues and non-tumor tissues were identified from six GEO datasets after integrating by the RRA method. Eighty prognosis-related genes were sifted out from TCGA datasets by using univariable Cox regression methods, which were then subjected to Lasso regression with tenfold cross-validation. Finally, a novel twelve-gene signature was determined. The risk score of each case was calculated based on the above model, which stratified HCC patients into highor low-risk groups. The twelve-gene signature was an independent prognostic factor of HCC according to the Kaplan-Meier survival curves analysis and ROC analysis and verified by TCGA datasets and GSE14520 dataset, and it was comparable or superior to AJCC stage at predicting 1-, 3-, and 5-year OS. In addition, the twelve-gene signature was independent with other clinical factors and performed better in predicting OS in combination with age and AJCC stage in a nomogram.
To date, many studies have explored the prognostic role of gene signatures in predicting the outcome of HCC. The effect of clinical practice based on gene expression profiles has shown that gene signatures might be a promising high-throughput molecular identification method [22][23][24]. In that regard, Liu et al. [25] identified a four-gene metabolic signature and Xiang et al. [26] identified seven-senescence-associated gene signature for predicting OS in HCC. Good predictive performance was obtained in their respective studies. However, our 12-gene signature had a higher 1-year, 3-year, and 5-year AUC than the above two gene signatures, which makes it more conducive to clinical application in several clinical important settings (Additional file 12: Figure S8). Sometimes, patients with similar clinicopathological features, such as AJCC stage, may have distinct outcomes, which might due to the heterogeneity of the different epigenetic and genetic backgrounds in tumor subtypes. Therefore, Fig. 7 The diagnostic performance of the twelve-gene signature in distinguishing HCC from normal samples. Crosstab of diagnostic prediction model for training (a) and validation (d) dataset. ROC curves of the diagnostic prediction models with the twelve genes for training (b) and validation (e) datasets. c, f Unsupervised hierarchical clustering of twelve genes in the diagnostic prediction model for training (e) and validation (f) datasets the combined use of the twelve gene prognostic signature and AJCC stage might be benefit to identify high-risk patients who should receive more medical supports.
The initiation and progression of HCC is a complex and multistage process regulated by both genetic and epigenetic alterations. DNA methylation, as a central epigenetic modification, would facilitate tumor development if it was dysfunctional, such as hypomethylation of oncogenes and hypermethylation of tumor suppressor [27,28]. In present study, most of the prognostic twelve genes were negatively correlated with DNA methylation in HCC tissues. As the DNA methylation-driven genes identified by the beta mixture model, SPP1 was hypomethylated and expressed at a higher level in HCC than in normal tissues, while LCAT was hypermethylated and low expression. Those results could be a validation and extension of the research initiated by Long et al. [28]. They found that the models consisting of SPP1 and LCAT were good at predicting HCC diagnosis, prognosis and recurrence. However, the marker involving in more pathogenic pathways not just DNA methylation might guide better clinical due to the complex biological process related with HCC pathogenesis.
The five of twelve-gene signature were downregulated in HCC tissues and identified as protective genes, including ANXA10, CYP2C9, LCAT, LECT2 and RDH16. ANXA10 is one of 13 annexins members of superfamily of calcium-dependent phospholipid-binding proteins [29]. Downregulation of ANXA10 is associated with worse pathological stage and poor prognosis in liver cancer [30,31]. Previous study has identified it as a prognostic biomarker of perihilar and distal cholangiocarcinoma but not intrahepatic cholangiocarcinoma, the gene promotes extrahepatic cholangiocarcinoma metastasis by facilitating the epithelial-mesenchymal transition via the PLA2G4A/PGE2/STAT3 pathway [32]. CYP2C9, a gene involved in drug absorption, distribution, metabolism and excretion, was downregulated in HCC tissue in part due to the de-differentiation of cancer cells [33]. It is associated with loss of liver-specific functions. LACT plays an important role in many cancers and was hypermethylated and downregulated in HCC compared with non-tumor tissue [34]. LECT2, a chemokine-like chemotactic factor, plays a protective anti-inflammatory role in the liver with β-catenin-induced tumorigenesis, and downregulated LECT2 expression results in the presence of inflammatory infiltrates, tumor progression and metastatic disease [35]. This protective effect is mainly targeted both tumoral hepatocytes and the hepatic immune microenvironment by inhibiting the Wnt pathway [36]. RDH16 is strongly express in normal liver tissue and is involved in retinoic acid metabolism, cellular proliferation, differentiation, and apoptosis [37]. The level of RDH16 expression is inversely associated with tumor size, microsatellite formation, thrombus and OS in HCC patients [38]. The seven of the twelve genes were upregulated in HCC tissue and were associated with poor survival, including SPP1, KIF20A, HMMR, TTK, MAGEA6, LAPTM4B and TPX2. The SPP1 gene, located on chromosome 4 in locus 4q13.22 [39], encodes a secreted chemokine-like glycophospho protein, named as the glycoprotein osteopontin. The biological functions of osteopontin are diverse [40]. It is overexpressed during the development and progression of different cancers, and has been suggested as a potential prognostic biomarker and therapeutic target [41]. KIF20A accumulates in the nucleus during the G2 phase of the cell cycle [42], and contributed to cellular proliferation, apoptosis and metastasis by regulating various signaling pathways, such as the E2F-retinoblastoma protein-p16 pathway and the PI3K/Akt signaling pathways. Intracellular HMMR participates in mitotic spindle pole formation and cytokinesis [43]. Overexpression of HMMR is associated with cancer growth, migration, and poor prognosis in various cancers, including HCC, via its stimulation of the hyaluronan-HMMR signaling cascade in addition to its oncogenic activities as noted above [44]. TTK is essential for the mitotic checkpoint and improper chromosome attachments [45]. Elevated TTK level facilitates chromosome instability and aneuploidy, thus contributing to cancer cell proliferation and invasion [46]. The essential role of TTK in HCC carcinogenesis by promoting cell survival and invasion via activation of Akt/mTOR and MDM2/p53 signaling pathways, and the regulation of miR-21 via TGF-β [47]. MAGEA6 is generally expressed in the male testis, but is re-activated in many tumor cells. MAGEA6 and TRIM28 was combined to form a cancer-specific ubiquitin ligase that inhibited AMPK signaling pathway and induced the stemness maintenance and self-renewal of HCC stem cells [48]. LAPTM4B was originally found to be overexpressed in HCC tissue, and it has inverse correlation with HCC differentiation. Evidences suggested that the overexpression of LAPTM4B may promote malignant transformation and enhance the metastasis and recurrence of HCC via activating AKT signaling pathway and some protooncogenes such as c-myc, c-jun and c-fos [49,50]. TPX2 is a nuclear proliferation microtubule-associated protein that can regulate spindle formation and stabilize spindle microtubules by promoting chromatin microtubule nucleation. Targeted silencing of TPX2 inhibited cell viability, disturbed cell cycle process and induced apoptosis of HCC cells by inhibiting the PI3K/AKT signaling transduction pathway.
Although we have identified prognosis-related gene signature which showed potentially substantial clinical significance, the present study had certain limitations. Considering the great heterogeneity of HCC, some important candidate genes affecting tumor prognosis might have been excluded before constructing the prognostic model, which could have decreased the performance of the model in the TCGA cohort. Moreover, the mechanisms of post-curative recurrence and metastasis (the major obstacles of HCC survival) might also help explain the relative low diagnostic performance, but the clinical follow-up information, including post-curative recurrence and metastasis data, were lacking in our included samples. In addition, the twelve-gene prognostic signature needs to be investigated by experimental verification, and evaluated for clinical application using multicenter randomized controlled studies.

Conclusions
In conclusion, we identified a prognostic twelve gene signature via comprehensive bioinformatics analysis with TCGA and GEO liver cancer cohorts. It could effectively stratify HCC patients into high-and low-risk group and independently predict the survival of HCC patients. Our finding indicated that the twelve gene signature may help facilitate personalized cancer management in the clinical setting. We therefore recommend using this classifier as a molecular diagnostic test to evaluate the prognostic risk in HCC patients.