Functional polymorphisms of the lncRNA H19 promoter region contribute to the cancer risk and clinical outcomes in advanced colorectal cancer

Background The long non-coding RNA H19 plays critical roles in cancer occurrence, development, and progression. The present study is for the first time to evaluate the association of genetic variations in the H19 promoter region with advanced colorectal cancer (CRC) susceptibility, environmental factors, and clinical outcomes. Methods 16 single-nucleotide polymorphisms (SNPs) were identified in the H19 gene promoter by DNA sequencing, and 3 SNPs among which including rs4930101, rs11042170, and rs2735970 further expanded samples with 572 advanced CRC patients and 555 healthy controls. Results We found that harboring SNP [rs4930101 (P = 0.009), rs2735970 (P = 0.003), and rs11042170 (P = 0.003)] or carrying more than one combined risk genotypes significantly increased the risk for CRC [P < 0.0001, adjusted OR (95% CI) 6.48 (2.97–14.15)]. In the correlation analysis with environmental factors, rs2735970 and gender, combined risk genotypes (> 1 vs. ≤ 1) and family history of cancer demonstrated significant interactions. Furthermore, a remarkably worse clinical outcome was found in combined risk genotypes (> 1 vs. ≤ 1), especially in CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer (Log-rank test: P = 0.006, P = 0.018, and P = 0.013, respectively). More importantly, the multivariate Cox regression analyses further verified that combined risk genotypes > 1 showed a prognostic risk factor for CRC patients with body weight ≥ 61 kg (P = 0.002), smoking (P = 0.008), and family history of cancer (P = 0.006). In addition, MDR analysis consistently revealed that the combination of selected SNPs and nine known risk factors showed a better prediction prognosis and represented the best model to predict advanced CRC prognosis. Conclusion 3 SNPs of rs4930101, rs11042170, and rs27359703 among 16 identified SNPs of H19 gene remarkably increased CRC risk. Furthermore, the combined risk genotypes had a significant impact on environmental factors and clinical outcomes in the advanced CRC patients with body weight ≥ 61 kg, ever-smoking, and first-degree family history of cancer. These data suggest that H19 promoter SNPs, especially these combined SNPs might be more potentially functional biomarkers in the prediction of advanced CRC risk and prognosis. Electronic supplementary material The online version of this article (10.1186/s12935-019-0895-x) contains supplementary material, which is available to authorized users.

Background Colorectal cancer (CRC) is still the third most commonly occurring cancer both in men and women worldwide. 1.8 million new CRC cases were diagnosed, and 609,000 death cases were reported in 2018 [1]. More importantly, the increased incidence and mortality of CRC were reported in young Asian adults including China [2][3][4]. The etiology of CRC is complicated in human and multifactor involved in carcinogenesis including environmental exposures, lifestyle factors, and especially multiple inherited genetic variations [5][6][7][8][9]. Non-coding RNA (ncRNAs) is regarded as "a genomic dark matter", increasing studies have indicated a strong association between single-nucleotide polymorphisms (SNPs) in ncRNAs with the risk for CRC [10][11][12][13][14][15][16][17]. Therefore, to identify genetic variations including those in lncRNA and the interactions between genetic variations with environmental factors could reveal novel diagnostic and prognostic biomarkers for CRC diagnosis and assessments of the treatment accuracy.
Long non-coding RNA (lncRNAs) were first identified in the 1990s [18,19], which are single-stranded, noncoding RNAs more than 200 nucleotides and no open reading frames (ORF) [20]. Rather than to be transcriptional noise, lncRNAs are the key players with multiple functions in carcinogenesis including regulating cancer cell cycle, proliferation, and apoptosis through regulating gene transcription and posttranscriptional processing [21][22][23][24]. The H19 gene is located on human chromosome 11p15.5, which is a cluster of imprinted genes including H19/insulin like growth factor 2 (IGF2). The H19 gene encodes 2.3 kb spliced and polyadenylated long noncoding RNA [25][26][27]. Indeed, H19 is highly expressed in the early stages of embryogenesis, and down-regulated with tissue maturation, however, (re)-expressed in human carcinomas tissues, such as CRC [28][29][30][31]. Thus, H19 is involved in cancer initiation, development, and progression, suggesting it could be a critical diagnostic and prognostic biomarker as well as a potential novel target in cancer therapy.
Recent functional studies provide insights into the roles of genetic variants in the H19 promoter region on the cancer risk, inter-individualized chemotherapy response and prognosis [10,[32][33][34]. The H19 expression was mainly regulated by H19 gene upstream 5′-flanking region, which contains differentially methylated regions (DMRs) and mutations [35]. To date, among the more than 100 SNPs found in the H19 gene (http://www.ncbi. nlm.nih.gov/proje cts/SNP), some potential functional SNPs in the promoter region play critical roles in altering individual susceptibility to cancer, interaction with environmental factors, and clinical outcomes in CRC [12,16,17,[36][37][38][39]. Bhatti et al. demonstrated that H19 rs2107425 polymorphism had close relationships with radiation therapy response in breast cancer patients in the United States (n = 859) [40]. O'Brien et al. further recognized that H19 rs2107425 polymorphism had significantly relationships with breast cancer susceptibility among African-Americans [41]. Yang et al. also reported that the H19 promoter SNP rs2839698 T allele contributes to the increased gastric cancer risk in a Chinese population [25]. The previous studies focused on H19 promoter SNP rs2107425 and rs2839698, which are not localized on the high incidence region in the upstream of the H19 gene. Therefore, to identify potential-functional SNPs in the H19 promoter region is urgently required which might benefit for early screening initiation and merit investigation.
In this study, we screened the distributions of genetic variation of approximately 3 kb upstream of the H19 promoter region and further investigated the possible association between every three SNPs in the human H19 gene (rs4930101, rs11042170, and rs2735970) with advanced CRC risk, environmental factors, and clinical outcomes. Crucially, this study would provide a novel diagnostic biomarker for advanced CRC patients. history of cancer. These data suggest that H19 promoter SNPs, especially these combined SNPs might be more potentially functional biomarkers in the prediction of advanced CRC risk and prognosis.

Patients and clinical information
Keywords: H19, Genetic polymorphisms, Susceptibility, Colorectal cancer, Prognosis Clinicopathological data were collected including age, gender, first-degree family history of CRC, smoking status, tumor size, tumor differentiation, pathological grade, lymph-node metastases from the interviewer-administered health risk questionnaires and medical records. Non-smokers were defined as individuals who < 100 cigarettes in a lifetime. BMI was calculated from selfreported height and body weight. Tumor differentiation and pathological grade for CRCs were performed according to the World Health Organization criteria. The patients underwent FOLFOX6 regimen for at least 2-3 cycles and were followed up monthly until recurrence or death. Age-, gender-, and ethnicity-matched healthy control volunteers (n = 555) were recruited from the same hospitals. After the interview, 5 ml blood samples were collected for further SNPs genotyping in each group.

Genotyping
Genomic DNA was extracted from peripheral blood leukocytes using the TIANGEN DNA Blood Mini Kit (TIANGEN Biotech CO., LTD, Beijing, China) and SNP genotyping was performed by TaqMan assay. The probes, primers and the related information about assay conditions, are available upon request. SNP allele-specific probes were labeled with the fluorescent dyes VIC and FAM by using the TaqMan SNP Genotyping Assays on the ABI 7500 Fast Real-Time PCR platform (Applied Biosystems, Life Technologies Corporation, Foster City, CA, USA). The genotyping rates of these SNPs were all above 90%. For quality control, approximately 10% of samples were randomly selected for repeated confirmation. Some of these samples were also confirmed by DNA sequencing analysis. The concordance rate of these repeated samples reached 100%, indicating that the genotyping method and results were reliable.

Statistical analysis
All data were analyzed via SPSS version 19.0 (SPSS Inc. Chicago, Illinois, USA) and a value of P < 0.05 was considered as statistically significant. Correlations between genetic polymorphisms and the susceptibility of CRC and clinical variables were assessed by odds ratios (OR) and 95% confidence intervals (CI) by unconditional logistic regression adjusted for age, gender, body weight, and smoking status. Overall survival (OS) was defined as the time between the surgery and death or last known followup. Disease-free survival (DFS) was the time from surgery until recurrence, death, or last known follow-up. Kaplan-Meier curves were used to assess DFS and OS, and the association between the DFS or OS with SNPs was estimated by Log-rank test. Multivariate Cox hazards regression models were used to estimating the adjusted hazard ratios and their 95% CI, thus to evaluate the independent prognostic value of each genotype and clinical variables. The high-order interactions were assessed between the SNPs and clinicopathological parameters by the Multiple Dimension Reduction (MDR) analysis.

Identification of SNPs in the promoter region of the H19 gene
To investigate the distribution difference of genetic variants of the H19 promoter region, the SNPs in approximately 3 kb upstream of H19 promoter were genotyped in CRC patients (n = 51) and healthy controls (n = 50) by DNA sequencing. Sixteen SNPs were identified compared with the Gene Bank (https ://www.ncbi.nlm.nih.gov/snp/), including rs10840167 (G/T), rs2525883 (C/T), rs4930101 Table S1; Fig. 1a). The genotype distributions of those SNPs in the control group were in agreement with the Hardy-Weinberg test (P > 0.05, Additional file 1: Table S1). To further evaluate whether those SNPs could affect CRC risk, we carried out a standard allelic association analysis on these SNPs by the Pearson χ 2 test and the logistic regression. The frequency distributions of rs4930101 (G/T), rs2735970 (A/G), rs11042170 (G/A) showed significantly different between CRC patients and healthy controls (Additional file 1: Table S1, Fig. 1b-d). Specifically, the SNP rs4930101GG genotype increased the risk for CRC development by 5.211-folds. The combined genotype GT/GG or G allele showed a further significant increase in CRC risk. Harboring rs11042170 GG or GA/ GG genotypes suggested a dominant higher risk for CRC development (GG vs. AA: P = 0.033, adjusted OR = 5.500, 95% CI 1.027-29.451; GA/GG vs. AA: P = 0.034, adjusted OR = 5.067, 95% CI 1.001-25.647, respectively). Moreover, a significantly increased frequency of the rs2735970 AG genotype in CRC patients was observed, compared with that in the healthy controls. In addition, no statistical association was observed between the susceptibility of CRCs and other SNPs of H19 promoter loci in this cohort (Additional file 1: Table S1).

The correlation of H19 rs4930101, rs11042170, rs2735970 with colorectal cancer risk
To study whether H19 promoter SNPs rs4930101, rs11042170, rs2735970 affect the susceptibility to CRC, we enrolled 572 CRC patients and 555 healthy controls with age and gender-matched. The Median age (range, years) of the CRC group and the control group were 59 (26-82) years and 59 (25-80) years, respectively. There was no statistical difference between the two groups (P = 0.789). Demographic data, risk factors and related clinical variables including tumor size, clinical stage, pathological type, lymph node metastasis status, chemotherapy regimen, and other information were list in Additional file 1: Table S2.

Table 1 Logistic regression analysis of associations between genotypes of H19 promoter SNPs and advanced CRC susceptibility
The significance levels are P < 0.05 for all the italics values a The observed genotype frequency among individuals in the control group was in agreement with Hardy-Weinberg equilibrium b P values, adjusted OR and 95% CI values were calculated by logistic regression adjusted for age, gender, body weight, smoking status, first-degree family history of cancer status c Risk genotypes used for the calculation were H19 rs4930101GT/GG + rs2735970GA/GG + rs11042170GA/GG

The interaction between H19 promoter SNPs with environmental factors and clinical variables
To explore the clinical utility of the SNP genotypes, the interactive effects of H19 SNPs between rs4930101, rs11042170, rs2735970 and the environmental factors or clinical variables were determined by χ 2 test and unconditional logistic regression adjusted by gender, ages, smoking status, and first history of cancer (Fig. 2b, Table 2 and Additional file 1:  . 2b and Additional file 1: Table S3). Body weight, smoking and family history of cancer act as the environmental higher risk factors of CRC, we further analyzed the interactions of environmental factors and genetic factors, and identify that combined risk genotypes (> 1 vs. ≤ 1) related to family history of cancer (P = 0.028, Table 2).

Prognostic markers evaluation of H19 rs4930101, rs11042170, rs2735970 in advanced CRC patients
To further clarify whether the 3 SNPs of H19 promoter region were independent prognostic factors in this cohort, we assessed the Log-rank test and multivariate Cox hazard regression analysis including all variables which could affect DFS and OS in CRC patients treated with FOLFOX6 regimen. Overall, there was no statistically significant correlation between the 3 SNPs of the  H19 gene and prognosis. However, remarkably worsen clinical outcomes were found in patients with combined risk genotypes (> 1), especially to those with body weight ≥ 61 kg, smoking, and first-degree family history of cancer (Log-rank test: P = 0.006, P = 0.018, and P = 0.013, respectively) (Fig. 3a-c). The median survival time (MST) in CRC patients with body weight ≥ 61 kg harboring more than 1 combined risk genotypes [MST (95% CI) 65 (59-70) months] was much shorter than those carrying ≤ 1 combined risk genotypes [MST (95% CI) 83 (76-89) months] (Fig. 3a). Meanwhile, in comparison to the reference combined genotypes with the MST on 83 months or 85 months, > 1 combined risk genotype was related to worse overall survival in the patients with smoking [MST (95% CI) 56 (52-60) months] (Fig. 3b) and a family cancer history [MST (95% CI) 66 (60-71) months] (Fig. 3c) (Table 3).

High-order interactions with CRC prognosis by MDR analysis
To further evaluate the existence of possible gene-environmental factors interaction in association with the clinical outcomes, high-order interactions were assessed by the multiple dimension reduction analysis on the 3 SNPs (rs4930101, rs2735970, and rs11042170), combined genotypes and 8 known risk factors (i.e., age, body weight, gender, smoking status, first-degree family history of cancer, tumor size, tumor differentiation, and clinical stage). In the MDR analysis, 8 risk factors combination was the best model with the highest cross-validation consistency (CVC) and the lowest prediction error in comparison to the one-factor model among all 5 risk factors. The 12-factor model had a maximum CVC and a minimum prediction error, with the prediction error being statistically significant (Table 4) both in DFS and OS. Taken together, the 12-factor model showed a better prediction for prognosis than the 8-factor model and represented the best model to predict CRC prognosis for this study population.

Discussion
Although only a small number of lncRNAs have been well-characterized, current studies have revealed that lncRNAs, such as H19 have been functionally associated with diseases occurrence, development, and progression, in particular, cancers [42,43]. Dysregulation of lncRNAs has been implicated in breast cancer, bladder cancer, gastric cancer, and colorectal cancer [44][45][46][47]. It is evident that dysregulation of H19 expression affects cellular functions, such as cell proliferation, imprinting, migration, invasion, and metastasis [28,43,[48][49][50]. Therefore, the genetic variations of H19, especially in the promoter region may play a critical role in affecting the susceptibility to cancer. In the current case-control study with 572 CRC cases and 555 healthy controls from northeast of the Chinese population, for the first time, we explored the potential association between H19 promoter genetic polymorphisms and CRC risk. We verified that 3 of the 16 included SNPs in the DMR upstream loci of H19 gene, namely rs4930101, rs11042170, and rs2735970, especially in the combined risk genotypes of the 3 SNPs were remarkably associated with an increased advanced CRC risk, environmental factors, and the clinical outcomes in the advanced CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer.
In the current study, we first detected the SNPs located at the DMR upstream loci of the H19 gene in the training set on 51 CRC patients and 50 healthy controls. Total 16

Table 4 MDR analysis for the prediction of prognosis with and without 3 SNPs genotypes in advanced CRC patients
The best model with maximum cross-validation consistency and minimum prediction error rate was in italics  1,2,3,4,5,6,7,8,9,10,11,12 100/100 < 0.0001 85.17 (45.84-158.0) 99/100 < 0.0001 ∞ SNPs were identified in this cohort. As the first discovered lncRNA, H19 is involved in regulating gene expression in the imprinted gene network and contributes to growth control in development [19,[51][52][53][54]. Due to the important roles in forensic identification, the 16 SNPs were detected in another two different nationalities, Chinese Han population and Chinese Korean nationality [55,56], which was consistent with our findings. In this study, because high-quality DNA could be easily prepared from peripheral blood, the genotyping of these SNPs was only identified based on genomic DNA. Van Huis-Tanja et al. [57] reported that 11 SNPs in 9 genes were determined in matched samples from blood and FFPE tissue of colorectal tumors by pyrosequencing and TaqMan techniques. They found only GSTP1 showed significant discordance between FFPE tissue and blood genotype, the discordant rate was only 1.4%. Recently, Shao et al. [58] evaluated the genotyping concordance between tumor tissues and peripheral blood in a genome-wide scale, and high concordant rate (97.42%) was found between tumor tissues and peripheral blood. Thus, we further investigate the relevance of those SNPs with advanced CRC risk and found 3 SNPs among those 16 SNPs showed significantly associated with cancer susceptibility including rs4930101, rs2735970, and rs11042170. With regard to the relationship of the SNPs with CRC risk, we further explored the investigation in a relatively large sample including 572 advanced CRC patients and 555 healthy controls on genomic DNA. Specifically, a significantly increased CRC risk was observed in the advanced CRC patients carrying SNP rs4930101, rs2735970, and rs11042170 homozygous genotype and under the dominant model. More importantly, a remarkably increased 6.48-fold of susceptibility to CRC cancer was determined for the first time in the patients harboring > 1 risk genotypes when compared with carrying ≤ 1 risk genotype (risk genotypes: rs4930101 GT/GG + rs2735970 GA/GG + rs11042170 GA/GG). To our knowledge, it is unclear whether the potential 3 SNPs could affect the expression of H19 and then develop the cancer risk. However, we found a strong synergistic effect in combined risk genotypes, suggesting they could act as a biomarker in CRC screening and diagnosis.
In this cohort, we further explored the gene-environmental factor interaction of H19 promoter SNPs rs493010, rs11042170, and rs2735970 with clinicopathological parameters of CRC patients including gender, body weight, smoking and family history of cancer. Although no association was found between rs4930101 and clinical variables, a significantly decreased distribution frequency of rs2735970 AA genotype was observed in the female CRC patients. Importantly, a remarkable relationship was found in the patients who carrying rs11042170 genotype or combined risk genotypes (> 1 vs. ≤ 1) with a family history of cancer. This also indicated that the G allele might be a genetic predisposition factor in advanced CRC. The effect of combined risk genotypes (> 1 vs. ≤ 1) is more significant than the single genotype variation. As cancer is multifactorial, the changes in combined genotypes could dramatically affect cancer development. Recent research found that some variants (rs10505477, rs6983267, rs10795668, and rs11255841) related to CRC risk are associated with the family history of CRC [59]. However, until now, the interaction between those 3 SNPs of H19 and CRC environmental factors is still unreported. Only one recent case-control study reported another SNP rs2107425 of H19 promoter region showed a combined greater impact on affecting lung cancer risk than individual effects of the SNPs with cooking smoke exposure [38]. These results indicate that the 3 tag SNPs could serve as potential biomarkers for evaluating the interaction of clinicopathological parameters and advanced CRC associated polymorphisms. Studies on other cancer types and larger sample sizes are encouraged to validate the findings and need to be elucidated and verified in the future.
To further excavate independent prognostic factors in this cohort, we for the first time to perform the logrank test, multivariate Cox regression analysis, and MDR analysis on all variables to possibly affecting DFS and OS in advanced CRC patients. No significant association was found between H19 SNPs and CRC overall survival in patients treated with FOLFOX6 regimen. However, the stratification analysis found a remarkably worsen clinical outcomes harboring combined risk genotypes (> 1 vs. ≤ 1) of CRC patients with body weight ≥ 61 kg, smoking, and first-degree family history of cancer, which suggested that combined genotype of the 3 SNPs may affect CRC prognosis and could be a promising biomarker for advanced CRC prognosis. As previously reported, the expression of H19 could be induced by cigarette smoke and other factors. Therefore, these data suggest that the combined genotypes of the potential SNPs could be functional biomarkers for predicting the prognosis, especially in the CRC patients with specific clinical characteristics including greater body weight, ever-smoking, and firstdegree family history of cancer.
In this study, we extensively evaluated the significant associations between SNPs of the H19 promoter region and CRC risk, pathological features, and clinical outcome in advanced CRC patients for the first time. Our results identified 16 SNPs in the DMR upstream loci of the H19 gene. The 3 potential SNPs of the rs4930101 G allele, rs11042170 G allele, rs2735970 G allele, and combined risk genotypes were associated with increased advanced CRC risk in a training set and overall cohort.