Single nucleotide polymorphisms and sporadic colorectal cancer susceptibility: a field synopsis and meta-analysis

Background Although mounting non-hereditary colorectal cancer (NHCRC) associated single nucleotide polymorphisms (SNPs) have been observed, no field synopsis and meta-analysis has been conducted through systematically assessing cumulative evidence, during the past 5 years. Methods We retrieved the database via the PubMed, Web of Science and Embase gateways to identify publications concerning the associations between SNPs and risk of NHCRC, up to May 1st, 2017. To assess the finding credibility, cumulative evidence was graded based on the Venice criteria. Meta-analysis was also performed for three subgroups including ethnicity (Asian vs Caucasian), primary cancer site (colon vs rectum) and TNM stage (I II vs III IV). Then, we arranged those high quality SNPs into different regions according to their locations on genes to evaluate their functional roles on CRC development. Results 5114 publications were collected and 1001 of them met our inclusion criteria, which totally included 1788 SNPs in 793 genes or distinct chromosomal loci. Totally, we performed 359 primary and subgroup meta-analyses for 160 SNPs in 96 distinct genes. By utilizing the Venice criteria, we identified 15 high quality SNPs with 25 high credibility significant associations. Furthermore, we artificially divided the high quality SNPs into different groups, based on their SNP loci (exon region, intron region, promoter region, downstream region, non-coding region and intergenic region). Conclusion We have identified 15 high quality SNPs which may act as promising genetic biomarkers for clinical NHCRC susceptibility screening and explored their functional roles on the NHCRC development based on their locations on genes. Electronic supplementary material The online version of this article (10.1186/s12935-018-0656-2) contains supplementary material, which is available to authorized users.


Background
Colorectal cancer (CRC) is the third most frequent cancer and the fourth major cause of cancer death worldwide [1]. Genetic factors play an important role in the carcinogenesis of CRC. Traditionally, CRC can be divided into familial CRC (hereditary CRC, HCRC) and sporadic CRC (non-hereditary CRC, NHCRC). HCRC only accounts for 20-25% of all CRC and is mainly attributed to precise high-penetrance mutations [2]. The overwhelming majority of CRC is NHCRC that can be caused by some genetic defects like single nucleotide polymorphism rather than any exact genetic mutation. Understanding of genetic variation is beneficial to strengthen the precaution, screening and early diagnosis of CRC, which is not only for HCRC but also for NHCRC. In a sense, the prediction and control of NHCRC is more expected than HCRC because it occupies the majority of CRC, and the control measures may be feasible and operable.
Single nucleotide polymorphism (SNP) is a common genetic variation, which may result in different functional

Open Access
Cancer Cell International *Correspondence: qxu@cmu.edu.cn; yuanyuan@cmu.edu.cn 1 Tumor Etiology and Screening Department of Cancer Institute and General Surgery, The First Hospital of China Medical University, No.155 NanjingBei Street, Heping District, Shenyang 110001, Liaoning, China Full list of author information is available at the end of the article products, thus affecting individual susceptibility to diseases. Hence, SNP can be considered as biomarker to predict the risk of sporadic tumor including CRC. During the past three decades, numerous SNPs have been illustrated to be correlated with CRC risk by extensive genome-wide association studies (GWAS) and also candidate-gene association studies (CGAS). Different data based Meta analyses from different angles also reported in the genetic predisposition to NHCRC. Making a general observation of preceding meta-analyses, most of them gathered only a fraction of SNPs and few noticed complete picture of SNPs in NHCRC from a field perspective. It's worth noting that, there have existed two comprehensive field meta-analyses which demonstrated all CRC risk associated variants, up to 2012, providing directions for future investigators [3,4]. Inspired by these two articles, we noticed that SNP plays an essential role in the genetic predisposition of CRC, constituting nearly 80% of he significant genetic variants which also include the insertion/deletion polymorphism and variable number of tandem repeat (VNTR). For SNP only, a renewed field synopsis and meta-analysis is required on account of the past 5 years since the latest field synopsis published, and the heterogeneity from ethnicity, primary cancer site and TNM stage must be considered. What's more, no studies mentioned the role of the whole associated SNPs on CRC development, based on their locations on genes.
In the present systematic review and meta-analysis, we focus on the high quality SNPs (which mean the SNPs are statistically associated with CRC risk in high credibility level, assessed by Venice criteria) in the field of genetic predisposition to NHCRC, involving the correlations of SNPs with ethnicity, primary cancer site (colon or rectal) and TNM stage (I II or III IV). Then, we arranged those high quality SNPs into different regions according to their locations on genes to evaluate their functional roles on CRC development.

Retrieval strategy
A comprehensive systematic literature search was performed for the publications concerning the association between SNP and risk of NHCRC. We retrieved the database via the PubMed, Web of Science and Embase gateway by using the search terms "(polymorphism or "single nucleotide polymorphism" or SNP or "genome wide association study" or GWAS) and (colon or rectal or rectum or colorectal) and (cancer or tumor or carcinoma or neoplasm)", up to May 1st, 2017. Moreover, each identified SNP was adopted as a keyword to further improve the search, for instance, 'XPG' or 'rs17655' in combination with "(colon or rectal or rectum or colorectal) and (cancer or tumor or carcinoma or neoplasm)" as query term.

Inclusion and exclusion criteria
To identify all eligible studies, we adopted the following inclusion criteria: (1) case-control study either candidate-gene association studies (CGAS) or genome-wide association studies (GWAS); (2) explored the correlation between SNP and NHCRC. In addition, the main exclusion criteria were: (1) overlapping studies; (2) no relation to NHCRC or nothing concerning SNPs; (3) no available data or inadaptable SNP genotyping methods; (4) any research published in abstraction form solely (e.g. conference proceedings or scientific meetings) (Fig. 1).

Data extraction
Data were independently extracted by two of the authors (Jing Wen and Qian Xu). Items collected from all eligible publications included first author, publication year (unpublished data show study year), race of participants, sample size, genes, SNP locus, genotype counts of cases and controls and HWE in controls. Multiple populations comprising one publication were extracted individually. Concerning GWAS, discovery and replication studies were regarded as separate datasets and were also extracted individually. When it came to eligible articles along with unreported data, we made efforts to contact with the authors.

Assessment of cumulative evidence
The epidemiology credibility of all seemingly significant associations confirmed by our meta-analysis were taken into account by applying Venice criteria [5,6]. Three categories considered as fundamental criteria to defined the credibility level are as follows: 1. Amount of evidence was evaluated by the total number of both cases and controls expressing the test alleles or genotypes: category ' A' , 'B' , 'C' represent for large-scale, moderate, or little respectively with over 1000, 100-1000 and less than 100 sample size. 2. Replication was classed based on the statistic of heterogeneity: category ' A' , 'B' , 'C' respectively stand for little inconsistency, moderate inconsistency or large inconsistency (no association) with I 2 < 25%, 25-50% and > 50%. 3. Protection from bias was classed as ' A' with no bias which was improbably to explain the positive result of association, 'B' with no obvious bias but could be the reason for the association, or 'C' with demonstrable bias. The general checks for bias include: association lost with removal of initial study; small intensity of association (0.87 < OR < 1.15) and existence of publication bias [7,8].
According to criteria mentioned above, the accumulative evidence of associations calculated by meta-analysis were regarded as high credibility level (three grades ' A'), intermediate credibility level (either ' A' or 'B'), and low credibility level (if any grades 'C'). Notably, the heterogeneity and bias could be exempted if the P value < 1×10 −7 after removing the initial study [8].

Statistics
Statistical analyses in our study were conducted by STATA software, version 11.0 (STATA Corp., College Station, TX, USA). All tests were two-tailed and P values ≤ 0.05 were regarded as the statistical significance level only if we emphasized once more. And it would reach a genome-wide significance level if P < 5 × 10 −8 [9]. The Hardy-Weinberg equilibrium (HWE) among genotype distributions of controls was assessed by Chi square test and P values < 0.05 were regarded as statistically significant disequilibrium. Appraisals of the association between the SNPs and colorectal cancer risk were assessed by pooled odds ratios (ORs) and 95% confidence intervals (CIs) calculated by random effect models when heterogeneity of between-study exists [10], otherwise fixed effect model [11]. Begg's test, as a funnel plot analyses, was implemented to verify significant asymmetry [12] and the modified Egger's test owns the capacity to correct type I errors through evaluating bias caused by small studies [13]. P value less than 0.10 was regarded as the threshold in both Begg's or Egger's test.
In addition, q value was considered as a measure for statistically significant findings in terms of false discovery rate (FDR), which is the proportion that significant findings are truly null hypotheses. For instance, 5% false discovery rate means that among all statistically significant SNPs, 5% of them are not actually associated with CRC risk. And we also considered 0.05 as the threshold of q value [14,15].

Features of eligible studies
According to the screening process showed in Fig. 1, 5114 publications were collected and 1001 of them met our inclusion criteria, which totally included 1788 SNPs in 793 genes or distinct chromosomal loci with 2,200,290 subjects extracted (cases: 971,074, ratio: 44%, range: 8-10,409, mean: 550).
Comprehensively considering the impact of the evidence amount, replication consistency (heterogeneity), and protection from bias (derived from publication bias, initial study influence and OR value) on the cumulative evidence, we applied the Venice criteria that could assess the epidemiological credibility for all significant findings. Thus, the high, intermediate and low credibility level of cumulative evidence were detected, which respectively account for 28% (n = 25), 16% (n = 14), 56% (n = 51). Publication bias was the most common cause (41/65) for non-high-quality evidence, and the inter-study heterogeneity could be the second (33/65). From the 25 high credibility significant associations, we identified 15 distinct high quality SNPs which were presented in Fig. 5.

Results from subgroup analyses
Significant associations in subgroup analyses were shown in Table 2, featured with high (n = 15), intermediate (n = 11) or low quality (n = 28). Results from three stratification analyses (ethnicity, primary cancer site and TNM stage) were illustrated as follows.
Results based-on ethnicity Disparate race were mentioned in 35 significant associations (see Fig. 2). 21 (60%) of them were identified in Caucasian only (18 from Caucasian subgroup meta-analyses and 3 from primary meta-analyses which included only Caucasian ancestry), 9 (25.7%) of them were indicated in Asian only (8 from Asian subgroup meta-analyses and 1 from primary meta-analyses which only covered Asian ancestry), and 5 (14.3%) SNPs (all from subgroup meta-analyses) obtained their correlations in both Caucasian and Asian.

Results based-on primary cancer site
Different cancer sites were mentioned in 8 significant SNPs (see Fig. 3). 5 (62.5%) of them showed their unique associations with colon cancer in subgroup analyses, 1 (12.5%) showed a sole association with rectum cancer in subgroup analysis, and 2 (25%) revealed their correlations with either colon or rectum cancer. Two high quality SNPs were found in rectum subgroup (CCND1 rs9344 and MTHFR rs1801131).

Results based-on cancer TNM stage
Subgroup meta-analyses of TNM stage demonstrated 7 SNPs with significant correlations (see Fig. 4). 4 (57.1%) of them simply correlated with TNM stage (I II), 2 (28.6%) of them related to TNM stage (III IV), and only 1 (14.3%) SNP showed it's correlation with any TNM stage of CRC. Among the 7 significant SNPs, only 2 high quality SNPs (LOC105376400 rs10795668 and CCND1 rs9344) were identified.

Results based-on SNP location
From 25 high credibility significant associations, 15 distinct high quality SNPs were identified. In order to further explore the role of these high quality SNPs, we artificially divided them into different groups, based on their SNP loci (exon region, intron region, promoter region, downstream region, non-coding region and intergenic region), which was displayed in Fig. 5. What's more, we also revealed the chromosome distribution of each high quality SNPs.

Discussion
In this article, we systematically reviewed the associations between 160 SNPs in 96 distinct genes or chromosomal loci and predisposition to NHCRC or to subgroups identified by ethnicity (Asian vs Caucasian), primary cancer site (colon vs rectum), TNM stage (I II vs III IV) and SNP locations on genes, with the quality assessment of cumulative evidence, and 15 high quality SNPs were ultimately confirmed. Above all, innovations and strengths of the present study ought to be addressed. First, a most comprehensive evaluation of the literature in the field of genetic predisposition to NHCRC was conducted. Second, we first reported 20 SNPs in primary meta-analysis, 24 SNPs in "primary cancer site" subgroup analysis (15 for colon, 9 for rectum) and 10 SNPs in "TNM stage" subgroup analysis. Third, for exploring the functional roles of high quality SNPs on the NHCRC development, we first divided them into six different groups, based on SNP loci on genes. This study provides the latest evidence and clues for the genetic susceptibility to NHCRC. In spite of these strengths, limitations cannot be ignored. First, we only considered allelic genetic model because it was widely regarded as a conservative model between the dominant and recessive model [16]. Second, type I error might exist by utilizing same series in more than one meta-analysis. However, after calculating q values, the incidence of type I error could be minimized. Third, we didn't analyze gene-gene or gene-environment interactions due to the insufficiency data. Future specialized studies should be designed to reveal their interactions.

High quality SNPs with NHCRC risk
Facing the excessive SNPs with significant associations, it's crucial to conduct a quality evaluation scientifically to those significant correlations. By utilizing the Venice criteria, we identified 15 high quality SNPs with 25 high credibility significant associations, which may act as promising genetic biomarkers for clinical NHCRC susceptibility screening.
For the whole population, 10 high quality SNPs were evaluated and shown in Table 1. Comparing our results with two published field meta-analyses [3,4], we found that 8 of the 10 SNPs were assessed as high quality SNPs for the first time, which meant they were used to be non-high-quality SNPs (with intermediate or low credibility level evidence), or even unreported in the past. Interestingly, by observing the gene functions of these high quality SNPs, we noticed that half of them participated in TGF-β/Smad signaling pathway, including TGF-β, SMAD7, BMP2, BMP4 and GREM1. This discovery could indirectly verified the crucial role of TGF-β/Smad signaling pathway on CRC pathogenesis by regulating their target genes [17]. In addition, there were four other high quality SNPs in non-coding RNA (including 1 micro-RNA: miR-27a; 3 long non-coding RNA: CASC8, CCAT2 and LOC105376400), which revealed that the aberrant expression of non-coding RNA could also be tightly related to CRC diagnosis [18][19][20]. Moreover, there was also one high quality SNP in ADIPOQ (adiponectin) gene, reminding that the deficiency of adiponectin might be one of the fundamental risk factors for NHCRC [21,22].
From the perspective of ethnicity, the apparent contrast between Caucasian and Asian population on the distribution of associated SNPs was presented in Fig. 2, which suggested that the molecular mechanism of CRC development couldn't always be the same among different ethnicities. Of note, 6 high quality SNPs were evaluated in Asian subgroup, all of which were first identified as high quality SNPs for Asian population; while 5 high quality SNPs were evaluated in Caucasian subgroup, and 3 were newly identified for Caucasian population. Observing the gene functions of these SNPs, KRAS, an important oncogene, caught our attention. It participated in RAF/MEK/MAPK, ERK and AKT signal pathways, regulating the CRC cell proliferation and differentiation [23,24].
From the aspect of primary cancer location, the different findings between colon and rectal cancer indicated that they not only differ in anatomic site, but also in molecular profile. A study illustrated that colon and  5 We artificially divided the 15 distinct high quality SNPs into six groups, based on their SNP loci on genes rectal cancer differ in embryological origin, metastasis manner and mutational profile, requiring various neoadjuvant treatment and surgical methods [25]. Nevertheless, none of the two publications have been concerned with the "primary cancer location" subgroup analyses. Herein, 2 high quality SNPs were demonstrated in rectal subgroup analysis. These results elucidated that the risk factors for rectum cancer development might be the aberrant expression of MTHFR (which leaded to abnormal folate metabolism [26][27][28]) or CCND1 (which could promote cell cycle G1/S transition [29,30]).
From malignant level perspective, TNM stage subgroup was first analyzed in our study with a high positive rate (8/20, 40%) and the diversity between stage I II and III IV also exist. It illustrated that SNPs could not only predict the NHCRC development, but also remind the degree of malignancy, directing the physical test frequency for patient and the treatment for doctor. Based on the limited pathological parameters provided by researchers, only 20 SNPs were analyzed in this subgroup and 2 of them (LOC105376400 rs10795668 and CCND1 rs9344) were identified as high quality SNPs in TNM stage (I II). Further studies should pay more attention to the association between polymorphisms and NHCRC malignancy degree.

Functional roles of high quality SNPs based on location
SNPs can influence the CRC susceptibility through complicated genetic and epigenetic mechanisms which depends on the their gene functions and their locations on genes. Hence, we arranged 15 high quality SNPs into different regions (including exon, intron, promoter, noncoding and also intergenic region) to focus on their feasible mechanisms on facilitating NHCRC development.
In exon region, the missense SNP (MTHFR rs1801131) make its contribution to the NHCRC by reducing the activity of enzyme [31,32]. Besides, the prime mechanisms for synonymous SNPs are their influence on mRNA expression level by altering splicing or stability of mRNA (such as ADIPOQ rs2241766 and CCND1 rs9344) [33][34][35][36].
Indeed, SNPs in intron region probably exert larger effects on target genes than we hitherto thought, on account of the plenty of functional elements in this region, including cis-acting RNA elements, intron splice enhancers and intron splice silencers and so on [37]. However, high quality SNPs in this region are shown to be associated with mRNA expression level without precise interpretation (such as SMAD7 rs12953717 and rs4464148) [38]. Hence, the mechanisms of high quality intronic SNPs should not be ignored by further researchers and studies concerned with these SNPs are still found wanting.
Regarding the SNPs located in promoter region, it has revealed that they can alter the binding ability to transcription factors, affecting the transcriptional efficiency of genes (such as TGF-β1 rs1800469) [39]. Moreover, the 3′-UTR region of genes contain multiple microRNA binding sites. Hence, SNPs in this region are speculated to disrupt the microRNA binding sites, leading to an increased expression level of target genes (such as KRAS rs712, predicted by a bioinformatics website: 'snpinfo. niehs.nih.gov').
For SNPs in non-coding region, we found that high quality SNPs were detected in both microRNAs (miRNA) and long non-coding RNAs (lncRNA), which could indirectly participate in CRC cancerogenesis by interacting with encoding mRNA. SNPs in miRNA have a crucial influence on its synthesis and down-regulation (such as miR-196a2 rs11614913 and miR-27a rs895819) [20,40,41] and can also regulate the binding capacity to target genes (such as miR-196a2 rs11614913) [42]. In addition, SNPs in lncRNA can lead to an aberrant expression of lncRNA by disrupting its vital regulatory region (such as CASC8 rs1505477) [43], and regulate the expression level of target genes by modulating the binding of transcription factors (TFs) to its promoter region (such as CASC8 rs1505477, CCAT2 rs6983267 and LOC105376400 rs10795668) [44][45][46][47].
Furthermore, their were also three high quality SNPs: BMP2 rs961253, BMP4 rs4444235 and GREM1 rs4779584, not located in known genes. Further studies are required to explain their association with CRC risk. Additionally, data in our study revealed that high quality SNPs are diffused distributed in coding or noncoding region of chromosomes: 1,3,8,10,11,12,14,15,18,19 and 20, which indicated the complicated molecular mechanisms for CRC generation involve numerous genomic and epigenomic variants.

Conclusion and expectations
In this systematic review and large-scale meta-analysis, we identified 15 distinct high quality SNPs associated with NHCRC risk and first reported 20 SNPs in primary meta-analysis, 24 SNPs in subgroup analysis (15 for colon, 9 for rectum) and 10 SNPs in TNM stage subgroup analysis. The comprehensive survey in the field of genetic predisposition to sporadic colorectal cancer generalized the current situation of the study on NHCRC susceptibility SNPs, providing useful data for investigators to design future studies.