Association of human XPA rs1800975 polymorphism and cancer susceptibility: an integrative analysis of 71 case–control studies

The objective of the present study is to comprehensively evaluate the impact of the rs1800975 A/G polymorphism within the human xeroderma pigmentosum group A (XPA) gene on susceptibility to overall cancer by performing an integrative analysis of the current evidence. We retrieved possible relevant publications from a total of six electronic databases (updated to April 2020) and selected eligible case–control studies for pooled assessment. P-values of association and odds ratio (OR) were calculated for the assessment of association effect. We also performed Begg’s test and Egger’s test, sensitivity analysis, false-positive report probability (FPRP) analysis, trial sequential analysis (TSA), and expression/splicing quantitative trait loci (eQTL/sQTL) analyses. In total, 71 case–control studies with 19,257 cases and 30,208 controls from 52 publications were included for pooling analysis. We observed an enhanced overall cancer susceptibility in cancer cases compared with negative controls in the Caucasian subgroup analysis for the genetic models of allelic G vs. A, carrier G vs. A, homozygotic GG vs AA, heterozygotic AG vs. AA, dominant AG + GG vs. AA and recessive GG vs. AA + AG (P < 0.05, OR > 1). A similar positive conclusion was also detected in the “skin cancer” or “skin basal cell carcinoma (BCC)” subgroup analysis of the Caucasian population. Our FPRP analysis and TSA results further confirmed the robustness of the conclusion. However, our eQTL/sQTL data did not support the strong links of rs1800975 with the gene expression or splicing changes of XPA in the skin tissue. In addition, even though we observed a decreased risk of lung cancer under the homozygotic, heterozygotic and dominant models (P < 0.05, OR < 1) and an enhanced risk of colorectal cancer under the allelic, homozygotic, heterozygotic, dominant (P < 0.05, OR > 1), our data from FPRP analysis and another pooling analysis with only the population-based controls in the Caucasian population did not support the strong links between the XPA rs1800975 A/G polymorphism and the risk of lung or colorectal cancer. Our findings provide evidence of the close relationship between the XPA rs1800975 A/G polymorphism and susceptibility to skin cancer in the Caucasian population. The potential effect of XPA rs1800975 on the risk of developing lung or colorectal cancer still merits the enrollment of larger well-scaled studies.

UV irradiation, tobacco, alkylating agents or pollutants, and xeroderma pigmentosum group A (XPA) acts as an essential NER member [1,2]. XPA protein, as a zinc finger DNA binding protein and an important damage verifier, can bind the NER core repair factors to identify the damage site of the DNA substrate [2][3][4]. Abnormal DNA repair mechanisms or mutated NER proteins are involved in the process of mutagenesis and oncogenesis and are often linked to a group of clinical disorders [1,2]. The human XPA rs1800975 T/C polymorphism is a common single nucleotide polymorphism (SNP) in the 5′-untranslated region of the XPA gene [5]. In the present study, we are interested in comprehensively exploring the possible effect of the XPA rs1800975 genetic variant on the susceptibility to different cancer diseases, such as skin cancer, lung cancer, breast cancer, esophageal cancer, gastric cancer, colorectal cancer or endometrial cancer.
There are different reports with distinct conclusions regarding the genetic relationship between the XPA rs1800975 polymorphism and cancer susceptibility in varied populations. For example, the XPA rs1800975 polymorphism was reported to be related to the risk of lung cancer in Norwegian [6], Germany [7,8] or Korean populations [9] but not in patients from Belgium [10] or the USA [11]. These results merit a comprehensive evaluation by means of a meta-analysis.
To the best of our knowledge, to date, only two metaanalyses regarding the association between the XPA rs1800975 polymorphism and susceptibility to overall cancer diseases have been previously reported in 2012 [12,13]. Nevertheless, no more than 36 case-control studies were enrolled for the prior meta-analysis. Therefore, we performed an updated comprehensive metaanalysis in 2020 based on the guidelines of preferred reporting items for systematic reviews and meta-analyses (PRISMA) [14]. In total, 71 case-control studies following the principle of Hardy-Weinberg equilibrium (HWE) were enrolled for pooling, and a series of stratified analyses, Begg's test, Egger's test, sensitivity analysis, FPRP analysis and TSA test, expression pattern, eQTL and sQTL analysis were conducted.

Database retrieval
Potentially relevant publications from six online databases, including PubMed, Excerpta Medica Database (EMBASE), Cochrane, China National Knowledge Infrastructure (CNKI), WANFANG and VIP, were retrieved until April 8, 2020. We did not set up any geographical or language restrictions for publications. Additional file 1: Table S1 shows our specific search terms during the database retrieval.

Screening criteria
The articles were then screened and evaluated for eligibility, according to our screening criteria. The inclusion criteria were as follows: genotypic frequency data for the XPA rs1800975 polymorphism in both cases and controls. The exclusion criteria included duplicate information; cell, plant or animal assay data; other diseases, genes or SNPs; review, meeting or meta-analysis; lack of normal control; lack of full genotypic data; and the genotypic distribution in controls was not in line with HWE.

Data extraction and quality evaluation
We utilized a table to independently extract the basic information, including first author, publication year, country, race, genotypic distribution, cancer type, control source, genotyping method, genotype frequency, and sample size. Possible disagreements were resolved by full discussion, and missing data were obtained by attempting to contact the corresponding author via e-mail. The P value of HWE in controls was obtained by the Chi square test. We evaluated the methodological quality of studies using the criteria of the Newcastle-Ottawa quality assessment scale (NOS) with a score ranging from one to nine. If the NOS score was less than five, the study was considered to be of poor quality.

Heterogeneity and association test
If the I 2 value (variation in ORs attributable to heterogeneity) > 50% and the P-value of heterogeneity < 0.05, we adopted a random-effect model for the test of association. Otherwise, a fixed-effect model was used, owing to the absence of significant interstudy heterogeneity. P-values of association, OR and 95% CI (confidence interval) were calculated for the allelic (G vs. A), carrier (G vs. A), homozygotic (GG vs AA), heterozygotic (AG vs. AA), dominant (AG + GG vs. AA) and recessive (GG vs. AA + AG) models. In addition, subgroup analyses for race, control source and genotyping method were conducted. In the subgroup analysis, a minimum of three case-control studies should be included to obtain a relatively scientific and reliable conclusion.

Publication bias assessment
Begg's test and Egger's test were carried out for the quantitative evaluation of potential publication bias. We finally obtained the P-values for Begg's test and Egger's test, Begg's funnel plot (pseudo 95% confidence limit) and Egger's publication bias plot. If there is a basic symmetrical funnel plot and yielded P-values were larger than 0.05, the absence of significant publication bias was suggested.

Data sensitivity
We also conducted sensitivity analyses under the above six genetic models. After the sequential removal of each case-control study, the obvious change in the estimates showed the lack of statistical stability. STATA 12.0 software (StataCorp, College Station, USA) was used for the above statistical analysis.

False-positive report probability test
As the relevant information of former studies [15][16][17], a false-positive report probability (FPRP) test was carried out for the assessment of the true genetic relationship probability under the parameters of FPRP threshold value with 0.2, power OR with 1.5, and prior probability levels with "0.25, 0.1, 0.01, 0.001, 0.0001, 0.00001″. If the FPRP value < 0.2 under the prior probability level of 0.1, a worthy outcome between XPA rs1800975 and cancer risk was considered.

Trial sequential analysis
We applied a trial sequential analysis (TSA) approach to adjust random and systematic error risk and provided the optimal sample size for pooling by means of TSA viewer software (Copenhagen Trial Unit, Copenhagen), similar to several reported studies [17][18][19]. The TSA plot with a two-sided boundary type was obtained by the parameters of type I error probability with 5%, statistical test power with 80%, and relative risk reduction with 20%. For the genetic model of AG + GG vs. AA, if the cumulative Z-curve crossed the TSA monitoring boundary and touched the line of required information size, the power of the results with robustness was regarded.

Expression pattern analysis
Based on the dataset of GTEx (Genotype-Tissue Expression) analysis release V8 (dbGaP accession phs000424. v8.p2) [20], we analyzed the expression profile of XPA gene (ENSG0000136936.10) across multiple tissues, such as heart, brain, lung, stomach or colon. Log 10 [TPM (Transcripts Per Million) +1] was utilized for scale. Besides, we applied the TIMER (Tumor Immune Estimation Resource) approach [21] to compare the expression difference of the XPA gene between tumor and adjacent normal tissues across all TCGA (The Cancer Genome Atlas) tumors. Wilcoxon test was used for the assessment of statistical significance. The results were visualized by the violin plot or box-plot.

The eQTL and sQTL analysis
Based on the dataset of GTEx [20], we also analyzed the "Significant Single-Tissue" eQTL (expression quantitative trait loci) and sQTL (splicing quantitative trait loci) in all tissues, for the XPA gene and the rs1800975 SNP. The values of sample number, NES (Normalized Effect Size), p-value, m-value were obtained. When m-value was larger than 0.9, an eQTL effect was considered [22]. The violin plots of eQTL and sQTL, and multi-tissue eQTL plots of the cross-tissue meta-analysis were provided, respectively. The normalized intron-excision ratio was used for the scale of sQTL.

Enrolled case-control studies
A schematic illustration of eligible case-control study selection is shown in Fig. 1. We initially obtained 400 publications from six databases. Then, duplicate publications were excluded, and the remaining 269 publications were screened. Of them, we further removed Fig. 1 Schematic illustration of case-control identification in our meta-analysis 195 publications using our screening criteria. A total of 22 full-text articles were also excluded due to "lack full genotypic data", "not in line with HWE" or "duplicate or overlapped data". We finally extracted a total of 71 casecontrol studies from 52 publications [6][7][8][9][10][11] for our integrated analysis. Table 1 lists the main characteristics of the enrolled case-control studies with good methodological quality (NOS score ≥ 5).

Overall meta-analysis results
As shown in Table 2 Table 2, a statistically significant difference in the susceptibility to cancer between cases and controls was detected under the allelic (P = 0.026, OR = 1.07), carrier (P = 0.009, OR = 1.04) and recessive (P = 0.001, OR = 1.12) genetic models. However, negative results were observed under other models ( Table 2, P > 0.05). We failed to obtain evidence regarding the relationship between the XPA rs1800975 polymorphism and the overall risk of cancer in the overall population.

Subgroup analysis results
Next, we conducted a series of subgroup meta-analyses stratified by race, control source and genotyping method. As shown in Table 3, an increased cancer risk in cases was observed compared with negative controls in the Caucasian subgroup analysis under the models of allelic G vs. A (P < 0.001, OR = 1.12), carrier G vs. A (P = 0.001, OR = 1.08), homozygotic GG vs AA (P < 0.001, OR = 1.24), heterozygotic AG vs. AA (P = 0.046, OR = 1.10), dominant AG + GG vs. AA (P = 0.004, OR = 1.16) and recessive GG vs. AA + AG (P < 0.001, OR = 1.16). A similar positive conclusion was detected in the subgroup analysis of the "population-based control, PB" under the allelic, carrier, homozygotic and recessive models ( Table 3, P < 0.05, OR > 1). For the PCR-RFLP subgroup analysis, we only observed an increased risk of cancer in the carrier (Table 3, P = 0.016, OR = 1.06) and recessive (P = 0.018, OR = 1.16) models.
There were no significant differences between cases and controls in the majority of comparisons (Tables 2, 3, 4, P > 0.05), indicating that XPA rs1800975 does not seem to contribute to the risk of specific cancer types, such as breast cancer, esophageal cancer, gastric cancer, reproductive system cancer, endometrial cancer, or head and neck cancer. Forest plots of subgroup analyses by race (

FRAP and TSA results
To strengthen our results in the subgroup analysis of "lung cancer", "colorectal cancer", and "skin cancer", we performed the FPRP test. As shown in Table 6, under the 0.1 prior probability level, the FPRP value for lung cancer was less than 0.20 under the heterozygotic and dominant models but not the homozygotic model, suggesting the lack of notable associations. We found that the subjects in different populations or the mixed source-based controls were included for the pooling analysis of lung cancer. Considering the above positive results in the subgroup of "Caucasian" and "PB", we also performed another pooling analysis limited to the Caucasian population. As shown in Additional file 1: Table S2, when we only included the Caucasian subjects for the pooling analysis, we did not observe positive conclusions (all P > 0.05). A similar negative conclusion was further detected in the meta-analysis using PB-based controls in the Caucasian population (Additional file 1: Table S3, P > 0.05). Collectively, this evidence did not support the strong association between lung cancer risk and XPA rs1800975. With regard to colorectal cancer, we only observed that the FPRP value was less than 0.20 in the allelic and homozygotic models, under the prior probability level of 0.1 (Table 6). There are only three case-control studies [36,40,43] in the Caucasian population in the pooling analysis. After removing one study with the HB-based control [36], only two studies with 460 cases and 921 controls were enrolled for the pooling analysis (Additional file 1: Table S3). Although we observed an increased risk of colorectal cancer under the homozygotic, heterozygotic and dominant models (Additional file 1: Table S3, P < 0.05, OR > 1), this does not exceed our minimum requirement for pooling analysis, which requires at least  three case-control studies. We cannot obtain a relatively scientific conclusion regarding the potential links of XPA rs1800975 and colorectal cancer risk.
As shown in Table 6, under the 0.1 prior probability level, the FPRP values for skin cancer were all less than 0.20, confirming notable associations. Caucasian subjects     and PB-based controls were enrolled in all case-control studies. We further performed the TSA test, and the TSA plot in Fig. 4 shows that the cumulative Z-curve of the dominant model can cross both the lines of the TSA monitoring boundary and the required information size, suggesting a credible conclusion regarding the association between XPA rs1800975 and skin susceptibility.

Publication bias and sensitivity analysis results
For the evaluation of publication bias, the two-sided P-value of Begg's and Egger's test > 0.05 (Table 2) and the absence of obvious asymmetry of funnel plots under each genetic model (Fig. 5a, b show the plots of allelic model as instances) suggested no evidence of large publication bias during the pooling analysis mentioned above. In addition, we failed to detect the greatly changed values of ORs and 95% CIs through our leave-one-out sensitivity analysis (Fig. 5c for allelic model as an example).

The eQTL and sQTL analysis results
Finally, based on GTEx datasets, we analyzed the expression profile of the XPA gene in different tissues, and the correlation between the gene expression and rs1800975 SNP of XPA. As shown in Additional file 10: Fig. S9, the XPA gene is expressed in various tissues, such as the brain, colon, esophagus, lung or skin tissues, suggesting a low tissue specificity. Based on the "Significant Single-Tissue" eQTL data (Fig. 6), we observed the potential association between XPA gene expression and rs1800975 SNP, in the tissues of artery aorta (P-value = 1.8e−9), artery tibial (P-value = 1.55e−6), esophagus muscularis (P-value = 3.59e−9), muscle skeletal (P-value = 6.39e−12), but not the skin tissue of ["not sun exposed (suprapubic)", P-value = 7.87e−1) or ["sun exposed (lower leg)", P-value = 5.16e−1). The data of multi-tissue eQTL comparison also suggested that four tissues (artery aorta, artery tibial, esophagus muscularis, muscle skeletal) were predicted to have an eQTL effect (Fig. 7, all m-value = 1.00). Cross-tissue meta-analysis further showed a potential overall correlation between gene expression and rs1800975 SNP of XPA (Fig. 7, P-Value = 3.07e−50). In addition, our sQTL data further showed a potential association between rs1800975 SNP and the splicing changes of XPA gene in the thyroid tissue (Fig. 8).

Discussion
Although we observed a group of publications regarding the influence of XPA rs1800975 on the risk of certain specific cancers, such as lung cancer [69,70], head and neck cancer [71], breast cancer [72], and digestive system cancer [73,74], the evaluation strategies, study number and statistical power differed. We were interested in comprehensively exploring the impact of XPA rs1800975 on overall cancer susceptibility by pooling all currently available evidence. To date, there are only two reported meta-analyses from 2012 [12,13] describing the association between XPA rs1800975 and susceptibility to overall cancer diseases. In the current study, we searched six online electronic databases, including PubMed, EMBASE, Cochrane, CNKI, WAN-FANG and VIP, with the last retrieval on April 8, 2020, to include a total of 71 case-control studies. Based on six genetic models (allelic, carrier, homozygotic, heterozygotic, dominant and recessive), a series of overall meta-analyses and subgroup analyses using the factors of race, control source and genotyping method, were used to scientifically assess the association between XPA rs1800975 polymorphism and the risk of cancer. Additionally, Begg's test and Egger's test, sensitivity analysis, FPRP analysis and TSA test were conducted. In 2012, Ding et al. included a total of thirty-six case-control or case-cohort studies from twenty-eight publications to conduct a meta-analysis for the genetic effect of XPA rs1800975 on the susceptibility to overall cancer [13]. They did not detect a positive conclusion in the overall meta-analysis but a significant difference between controls and cases in the "lung cancer" subgroup analysis under the homozygotic and recessive models, the "Asian" subgroup in the dominant models, and the "skin cancer" subgroup in the homozygotic, heterozygotic, dominant and recessive models. In our updated meta-analysis, we excluded three publications in which the genotypic distribution of the control group was not in line with the HWE principle [75][76][77] and one publication related to oral premalignant lesions [78]. We also replaced one publication [79] with another one [67]. In addition, we added a total of twenty-eight publications for our new pooled analysis. In 2012, Liu et al. included twenty-four publications to conduct another meta-analysis and reported an increased colorectal cancer risk under the homozygotic and dominant models but a decreased susceptibility to lung cancer under the homozygotic and dominant models [12]. In the present study, we removed two publications owing to HWE [75,77] and added another thirty new publications for our updated integrative analysis.
Our new findings showed a positive conclusion in the overall meta-analysis only under the carrier and recessive models, and in the "Caucasian" subgroup analysis under each model. We failed to detect a significant difference between cases and controls in the Asian population. The sample size contributes to the inconsistency with the data of Ding et al. [13].
Additionally, we detected a decreased lung cancer risk in cases under the GG vs. AA, AG vs. AA, AG + GG vs. AA models but an increased risk of colorectal cancer under the allelic, homozygotic, heterozygotic, dominant models, indicating the possible effect of the AG genotype of XPA rs1800975 on the susceptibility to colorectal cancer. These findings are partly in line with the conclusion of the above prior meta-analyses [12,13]. Nevertheless, our data from FPRP analysis and another pooling analysis with only the population-based controls in the Caucasian population did not strongly support the protective role of the G allele within the XPA rs1800975 polymorphism in the risk of lung or colorectal cancer. Our data from the pooling analysis, FPRP analysis and TSA demonstrated a significant difference between skin cancer cases and negative controls under six genetic models, suggesting the contribution of the G allele within XPA rs1800975 to an enhanced susceptibility to skin cancer. Our eQTL and sQTL analysis data of GTEx showed that the XPA rs1800975 might not be associated with the gene expression or splicing changes of XPA in the skin tissue, suggesting the existence of other molecular mechanisms.
There are several strengths within our pooling analysis. No case-control study with poor quality was enrolled. We also excluded studies in which the genotypic contribution in the control group was not in Hardy-Weinberg equilibrium. In addition, both the absence of larger publication bias and the stability of pooling data were observed in all comparisons.
There are also several disadvantages during our analyses, which need to be discussed. First, fewer than ten case-control studies were enrolled in some comparisons, such as the subgroup meta-analysis of "breast cancer", "gastric cancer", "colorectal cancer", "endometrial cancer", "head and neck cancer", and "skin cancer". Therefore, several comparisons, such as subgroup analyses of "oral cancer" or "skin SCC", were not carried out. In addition, high heterogeneity was present, and the "randomeffect with DerSimonian and Laird method" was set in the overall meta-analyses under the allelic, homozygotic, heterozygotic, dominant and recessive models. There exists a decreased level of between-study heterogeneity in some subgroups of "Caucasian" (data not shown), indicating that ethnicity may be involved in the heterogeneity source.
After investigating the expression difference of XPA gene between tumor and adjacent normal tissues in TCGA project (Additional file 11: Fig. S10), we observed a higher expression level of XPA in the tissues of CHOL (Cholangiocarcinoma, P < 0.001) and LIHC (Liver hepatocellular carcinoma, P < 0.001), but a lower level in the tissues of BLCA (Bladder Urothelial Carcinoma), BRCA (Breast invasive carcinoma), KICH , and UCEC (Uterine Corpus Endometrial Carcinoma) (all P < 0.05), compared with the corresponding control tissues. Apart from that, we predicted that the tissues of artery aorta, artery tibial, esophagus muscularis, muscle skeletal have an eQTL effect, while the thyroid tissue has a sQTL effect. Thus, it is meaningful to explore the potential genetic influence of all XPA genetic variants or the combined variants of XPA and other relevant genes (such as xeroderma pigmentosum group D, XPD) in the pathogenesis of the above tumors, arterial or muscular system-related diseases. The larger sample sizes are warranted, and the factors of age, sex, smoking, drinking, or therapy should be adjusted.

Conclusions
To summarize, our comprehensive integrative analysis data demonstrated statistical evidence on the association between the XPA rs1800975 A/G polymorphism and susceptibility to skin cancer, especially skin BCC, in the Caucasian population. The enrollment of more case-control studies following the HWE principle in diverse ethnicities will help researchers to further verify the potential genetic role of the XPA rs1800975 polymorphism in the risk of lung or colorectal cancer.