Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


Background

Colorectal cancer has a strong epigenetic component that is accompanied by frequent DNA methylation (DNAm) alterations in addition to heritable genetic risk. It is of interest to understand the interrelationship of germline genetics, DNAm, and colorectal cancer risk.

Methods

We performed a genome-wide methylation quantitative trait locus (meQTL) analysis in 1,355 people, assessing the pairwise associations between genetic variants and lymphocytes methylation data. In addition, we used penalized regression with cis-genetic variants ± 1 Mb of methylation to identify genome-wide heritable DNAm. We evaluated the association of genetically predicted methylation with colorectal cancer risk based on genome-wide association studies (GWAS) of over 125,000 cases and controls using the multivariate sMiST as well as univariately via examination of marginal association with colorectal cancer risk.

Results

Of the 142 known colorectal cancer GWAS loci, 47 were identified as meQTLs. We identified four novel colorectal cancer-associated loci (NID2, ATXN10, KLHDC10, and CEP41) that reside over 1 Mb outside of known colorectal cancer loci and 10 secondary signals within 1 Mb of known loci.

Conclusions

Leveraging information of DNAm regulation into genetic association of colorectal cancer risk reveals novel pathways in colorectal cancer tumorigenesis. Our summary statistics-based framework sMiST provides a powerful approach by combining information from the effect through methylation and residual direct effects of the meQTLs on disease risk. Further validation and functional follow-up of these novel pathways are needed.

Impact

Using genotype, DNAm, and GWAS, we identified four new colorectal cancer risk loci. We studied the landscape of genetic regulation of DNAm via single-SNP and multi-SNP meQTL analyses.

Free full text 


Logo of nihpaLink to Publisher's site
Cancer Epidemiol Biomarkers Prev. Author manuscript; available in PMC 2022 Nov 4.
Published in final edited form as:
PMCID: PMC9081265
NIHMSID: NIHMS1786408
PMID: 35247911

Genetic regulation of DNA methylation yields novel discoveries in GWAS of colorectal cancer

Abstract

Background:

Colorectal cancer (CRC) has a strong epigenetic component that is accompanied by frequent DNA methylation (DNAm) alterations in addition to heritable genetic risk. It is of interest to understand the interrelationship of germline genetics, DNAm, and CRC risk.

Methods:

We performed a genome-wide methylation quantitative trait locus (meQTL) analysis in 1355 people, assessing the pairwise associations between genetic variants and lymphocytes methylation data. In addition, we used penalized regression with cis-genetic variants +/− 1Mb of methylation to identify genome-wide heritable DNAm. We evaluated the association of genetically predicted methylation with CRC risk based on GWAS of over 125,000 cases and controls using the multivariate sMiST as well as univariately via examination of marginal association with CRC risk.

Results:

Of the 142 known CRC genome-wide association studies (GWAS) loci, 47 were identified as meQTLs. We identified 4 novel CRC associated loci (NID2, ATXN10, KLHDC10 and CEP41) that reside over 1Mb outside of known CRC loci and 10 secondary signals within 1Mb of known loci.

Conclusions:

Leveraging information of DNAm regulation into genetic association of CRC risk reveals novel pathways in CRC tumorigenesis. Our summary statistics-based framework sMiST provides a powerful approach by combining information from the effect through methylation and residual direct effects of the meQTLs on disease risk. Further validation and functional follow-up of these novel pathways are needed.

Impact:

Using genotype, DNA methylation, GWAS, we identified four new CRC risk loci. We studied the landscape of genetic regulation of DNA methylation via single-SNP and multi-SNP meQTL analyses.

Keywords: DNA methylation, Biostatistics, Genetics of Cancer Risk and Outcome, Colorectal Cancer, Gene polymorphisms

Introduction

Large genome-wide association studies (GWAS) have identified 142 genetic variants associated with colorectal cancer (CRC) risk to date (1,2). While large GWAS have been essential for variant discoveries, novel information can be gained by combining GWAS with functional studies of genomic characteristics. Studying association of genetic variants with molecular features such as gene expression and DNA methylation (DNAm) may reveal underlying disease mechanisms and discover additional novel loci. With the availability of genomic studies, results from large-scale GWAS studies, and summary statistics-based methods, it is now feasible to study these relationships. In this paper, we examine the impact of genetic variants on DNAm and utilize this information to study the association of methylation quantitative trait loci (meQTL) with CRC risk.

DNAm refers to the addition of a methyl group to a cytosine phosphate guanine (CpG) site of a DNA strand. It plays a critical role in colorectal carcinogenesis (36) and is associated with survival time in CRC (7,8). In addition, a well-studied subtype of CRC is the CpG Island Methylator Phenotype (CIMP), which is defined by the degree of methylation at specific CpG islands in the tumor genome (911). Other measures of DNAm and epigenetic modifications have also been used as screening and early detection, such as ColoGuard (NDRG2 and BMP3) and Epi ProColon (SEPT9) (12).

The interrelationships between DNAm, germline genetics and complex phenotypes, have been examined previously. Freytag et al (13) developed a genetic predictive model for DNAm at each CpG site using elastic net penalization and assessed the association of predicted DNAm with complex phenotypes including schizophrenia, Alzheimer’s disease, and irritable bowel disease. Yang et al developed genetic predictive weights of DNAm within the Framingham Heart Study and used these weights to assess associations of predicted DNAm with epithelial ovarian (14) and breast cancers(15). However, to our knowledge, there has been no similar study relating genetically regulated methylation with CRC risk.

Many methods have been developed to evaluate how genetic variants that impact a genomic characteristic are associated with an outcome. Typically these methods assess the associations between SNPs and a genomic feature and then use the resulting effect sizes as weights to impute the genomic feature into a much larger genetic dataset for association with the outcome(1619). This has been expanded to use only GWAS summary statistics, alleviating both computational and data accessibility burdens. These approaches serve as a vital tool in leveraging genomic information for novel discoveries. They are closely related to two-stage least square Mendelian randomization approaches, when the genetic variants affect the outcome only through the genomic feature. MiST (Mixed effects Score Test) (19,20) and the summary statistics-based sMiST(21) allow for SNP effects through not only the predicted genomics features like in Mendelian randomization but also alternative pathways by modeling the residual genetic effects. By this, MiST or sMiST tests the total effect of SNPs on the outcome.

In this paper, we evaluated the relationship between germline SNPs and DNAm based on the GWAS and methylation data of 1,355 study participants by both single-SNP and multi-SNP meQTLs analyses. We then assessed the association of meQTLs with CRC risk by utilizing summary statistics from over 125,000 CRC cases and controls.

Materials and Methods

Description of study with DNA methylation and GWAS data

The DNAm and GWAS data were from the Ontario Familial Colon Cancer Registry (22). Cases were identified from the Provincial Cancer Registry between 1997 and 2000 and controls were recruited randomly via household telephone interviews (22,23). Informed consent was obtained from participants. All study protocols were approved by the relevant Institutional Review Boards, and informed consent was obtained from study participants in accordance with the Helsinki Accord. After QC and filtering, a total of 1,355 cases and controls had both genome-wide genotyping and methylation data available. All participants were of European descent as confirmed by GWAS data. DNAm was measured in lymphocytes. Detailed information on DNAm pre-processing and QC are provided in Supplementary Methods. After QC, there were 298,248 CpG sites. The majority of cases and controls were plated separately, except for 267 study participants (115 cases and 152 controls).

Study participants were genotyped on the Affymetrix or the Human1M/1Mduo platforms (22,23) and imputed to the Haplotype Reference Consortium panel (1). Variants were excluded if the imputed minor allele frequency (MAF) was < 1%, missing rate > 10%, or imputation R2 < 0.3. In total, there were 7,672,848 imputed genetic variants, which were coded as imputed dosage ranging from 0 to 2.

GWAS summary statistics and linkage disequilibrium (LD) reference panel

The GWAS- summary statistics of over 125,000 colorectal cancer cases and controls were obtained from Huyghe et al(1), including data from GECCO (Genetic and Epidemiology Colorectal Cancer Consortium), CCFR (Colon Cancer Family Registry) and CORECT (Colorectal Transdisciplinary Study). A full list of studies that contribute to the summary statistics is given in Table S1 with more information published previously (1,2426). We removed variants with MAF < 0.01. The LD reference panel is based on 29,303 individuals of European ancestry from GECCO (1,24).

Single-SNP MeQTL Analysis

We assessed the pairwise association of SNPs and CpGs to identify meQTLs. We undertook a two-stage discovery and replication design by splitting the participants genotyped on Affy Chip in the discovery (n = 856) and those genotyped on Human1M/1Mduo (n = 499) in the replication. We adjusted for sex, case-control status, sample plate, position on plate, age, two principal components (PC’s) for technical effects, and 6 PC’s from a subset of CpG sites to adjust for unobserved cell types. More details on the PC are provided in Supplement Methods. DNAm was logit transformed so that the range would not be restricted (27). The analysis was done using the matrixEQTL R package (28).

We classified SNP-CpG pairs into cis-meQTL when the SNP was within 1 Mb of the CpG site, and trans-meQTL otherwise. A total of 1,710,673,549 cis associations and 2,286,700,896,755 trans associations were tested. We adjusted for multiple testing using the Bonferroni correction with type I error 0.05 for cis- and trans-meQTL associations, separately. For discovery, we set the genome-wide significance threshold 0.05/1,710,673,549 ≈ 2.92e-11 for cis associations, and 0.05/2,286,700,896,755 ≈ 2.19e-14 for trans associations. For replication, the threshold was based on the number of cis (ncis) or trans (ntrans) that passed the discovery stage, i.e., 0.05/(ncis or ntrans). The direction of the association in the replication stage also needed to be the same as in the discovery stage. Top meQTL associations were defined as having the lowest p-value in the replication set.

We performed LD pruning on the meQTLs to account for the high SNP LD. For each CpG site, we selected the lead meQTL (based on the replication p-value) and removed any meQTL with LD R2 > 0.2 with the lead meQTL. We then moved to the next most significant remaining meQTL and repeated until no meQTLs left. For computational ease, we assumed no cross-chromosomal LD.

Association of single-SNP meQTL with CRC

We examined associations of meQTLs identified from the single-SNP meQTL analysis with CRC risk based on GWAS summary statistics (1). For each CpG site, we picked its lead meQTL with the largest test statistic and examined their associations with CRC risk. We adjusted for multiple testing using Bonferroni correction based on the number of SNPs being tested. An association was considered novel if there were no known CRC loci within 1MB. The known loci are those that reached the genome-wide significance level 5e-08 as previously reported (Table S2).

Multi-SNP meQTL analysis

To assess the joint effects of SNPs on DNAm, we fit a penalized regression model on each CpG site. We did not include trans SNPs due to the fact that the majority of meQTLs are cis and substantial computational burden if we include trans SNPs. To improve power, we combined both discovery and replication methylation data and adjusted for the genotyping platform in addition to the covariates in the single-SNP meQTL analysis. These covariates were not penalized. We fit four different elastic net models with α values of 0.25, 0.5, 0.75, and 1.0 (lasso), following previous literature that showed elastic net models tend to provide better prediction when some sparsity was induced (16,18,29,30). We performed ten-fold cross-validation and determined the optimal penalty λ value based on cross-validation MSE. We estimated the weights at the optimal penalty value using the entire dataset. Analyses were done using the glmnet package in R (31). We evaluated performance by the mean cross-validated partial R2 of DNAm variation explained by SNPs across 10 folds

Association of multi-SNP meQTLs with CRC

We tested the joint association of meQTLs identified from the penalized regression with CRC risk for each CpG site using sMiST (21). sMiST combines both the association of the genetically predicted DNAm at that CpG site using the weights obtained from the penalized regression and the association of residual effects of individual meQTLs (random effects) (21) (Supplementary Methods). We used Fisher’s combination procedure to combine the tests for these two components. We restricted our analysis to CpG sites with a mean cross-validated partial R2 above 1%. A p-value < 0.05/# of CpG sites tested was considered genome-wide statistically significant.

We performed conditional analysis of CpG sites that passed the genome-wide significance, adjusting for all known CRC GWAS loci. An association with p-value < 0.05/# of genome-wide significant CpG sites was considered novel if there were no known loci within 1 MB or as a novel secondary signal otherwise. As the participants with methylation data were part of GECCO, which contributed to the GWAS CRC summary statistics, we conducted a sensitivity analysis of significant associations by excluding the GECCO data.

Results

Table 1 provides descriptive statistics of the people in the methylation analysis dataset. There were 1,355 participants, with 856 genotyped using the Affy Chip assay and 499 using the Human1M/1Mduo genotyping array. Roughly 50% were men, 47.5% were CRC cases, and the mean age was 62 years old.

Table 1:

Descriptive statistics of the methylation study.

Affy Chips
(n=856)
Human1M/1Mduo
(n=499)
Overall
(n=1355)
Sex
 Female454 (53.0%)225 (45.1%)679 (50.1%)
 Male402 (47.0%)274 (54.9%)676 (49.9%)
Case/Control
 Case456 (53.3%)187 (37.5%)643 (47.5%)
 Control400 (46.7%)312 (62.5%)712 (52.5%)
Age reference
 Mean (SD)62.3 (7.62)60.3 (9.22)61.6 (8.30)
 Median [Min, Max]63.0 [29.0, 77.0]62.0 [27.0, 76.0]63.0 [27.0, 77.0]

Landscape of MeQTL

A total of 5,228,463 cis SNP-DNAm associations passed the genome-wide threshold in the discovery stage and 3,444,113 were replicated. Of these 3,444,113 cis associations, there were 32,686 unique CpG sites and 1,280,770 unique SNPs, with many SNPs associated with multiple CpG sites. For trans-associations, a total of 559,695 SNP-DNAm associations passed the genome-wide threshold in the discovery stage and 292,919 trans-associations were replicated. Of these 292,919 associations, 160,752 were CpG-SNP pairs on the same chromosome but over 1 Mb apart. Across the 2,288,411,570,304 tests performed (every CpG-SNP pair) in the discovery, the median p-value was between 0.499 to 0.501 suggesting no overall inflation.

Of the 142 CRC known variants (Table S2), all but 2 variants (rs1391441 and rs11087784) were either meQTLs (n = 47) or within 1Mb of a meQTL (n=93). For the 47 known loci that are meQTLs, the majority (n=43) were associated with CpGs on the same chromosome (Table S3 and S4). SNP rs3131043 (6:30758466_A/G) was a meQTL for 28 CpG sites including 18 cis, 8 trans on the same chromosome, and 2 trans on the different chromosomes. Amongst the 94 that were not meQTLs but within 1Mb of meQTLs (one variant was not in our dataset), only three were in high LD with a meQTL (R2 above 0.7): rs7121958, rs12594720, and rs6983267.

We examined the number of CpGs associated with each SNP stratified by cis associations and trans associations that on the same and different chromosomes (Figure 1). Not surprisingly, there was a peak of cis-associations on chromosome 6 that map to the MHC region (Figure 2). In total, there were 584,770 unique CpG-SNP associations within 5Mb on either side of HLA-A (the peak), accounting for 15.6% of all CpG-SNP associations and 42.8% of all trans associations (459,341 cis, 97,930 trans on the same chromosome, and 27,499 trans on the different chromosomes). This high amount of trans on the same chromosome is likely due to the extensive LD in the MHC region.

An external file that holds a picture, illustration, etc.
Object name is nihms-1786408-f0001.jpg

Modified Manhattan plot of frequencies of a SNP being meQTL for cis-association (top), trans on the same chromosome (middle), and trans on different chromosomes (bottom).

An external file that holds a picture, illustration, etc.
Object name is nihms-1786408-f0002.jpg

Zoom-in modified Manhattan plot on the MHC region. Line shows HLA-A.

As many cis- and trans- associations are in LD, we performed LD pruning for each CpG site. This resulted in 70,106 associations with 67,620 cis associations, 529 trans on the same chromosome, and 1,957 trans on different chromosomes. Due to the uniqueness of the HLA region, the associations where CpGs or SNPs were within 5Mb of HLA-A on either side were excluded in tallying the number of cis and trans associations. The top cis association was between rs7083920 (10:45071890_C/T) and cg02113055 (10: 45072520) (replication p-value < 1e-100) which were 630 bp apart. The top trans on the same chromosome association was rs3764855 (17:66244578_T/C) and cg27123423 (≈ 3.27 Mb, replication p-value < 1e-100). The top trans association on different chromosomes was rs12222783 (11:108531761_T/C) and cg23247274 (MRP63/SKA3, 13:21750649)(replication p-value < 1e-100). These associations were in excellent agreement between the replication and discovery datasets (Figure 3). Figure 3 also displays the next three strongest associations in all three categories (list of associations in Figure 3 are in Table S5).

An external file that holds a picture, illustration, etc.
Object name is nihms-1786408-f0003.jpg

Boxplot of DNAm by SNP genotypes for A) top four cis associations, B) top four trans on the same chromosome associations, and C) top four trans on different chromosome associations. Imputed SNPs are hard coded for ease of visualization.

Novel association of single-SNP meQTLs with CRC

We assessed the association of single-SNP meQTLs with CRC risk. Across all CpG-SNP associations, there were 33,845 unique CpG sites. Taking the top meQTL for each of these CpG sites resulted in 27,287 unique SNPs associated with DNAm, of which 3,917 SNPs were the top meQTL for multiple CpGs. Using the CRC summary statistics, 70 SNPs passed the Bonferroni significance threshold 0.05/27,287 = 1.83e-06 (Table S6). Of these 70 SNPs, 68 were within 1Mb of 142 known GWAS variants. The two SNPs outside 1Mb were in high LD (R2 = 0.99) and they were associated with increased CRC risk (SNP rs2749870, 14:52493068_reference allele/effect allele [G/A], OR = 1.04, 95% CI = 1.03–1.06, p-value = 1.2e-06; SNP rs2516600, 14:52487766_T/C, OR=1.04, 95% CI = 1.03–1.06, p-value = 1.3e-06). Conditioning on known loci did not change the associations greatly (rs2749870 p-value = 1.37e-05, rs2516600 p-value = 1.37e-05).

SNP rs2749870 up-regulated methylation at cg26923084 (14: 52536893, replication effect = 0.39, 95% CI = 0.35,0.41), while SNP rs2516600 down-regulated methylation at cg20550154 (14: 52487779, replication effect = −0.71, 95% CI = −0.76,-0.69) with DNAm at these two CpG sites negatively correlated (Pearson correlation = −0.5) (Figure S1). We performed colocalization for the region from 52.4 to 52.5Mb using coloc (32). There was strong evidence of a common signal for CRC risk and methylation at cg20550154 (posterior probability = 0.96), but not at cg26923084 (posterior probability = 0.24).

Multi-SNP meQTLs and Predictive Weights

We built genetic prediction models for DNAm at each CpG site using cis SNPs. Of the 298,248 CpG sites, 179,087 CpG sites had at least one SNP selected by an elastic net models (178,390, 175,709, 174,647, and 174,015 for α = 0.25, 0.50, 0.75, and 1, respectively). All models had similar partial R2 (Figure S2), but lasso had the smallest model size with mean number (standard deviation [SD], range) of selected SNPs 15.5 (19.2,1–755). For the remaining analyses, we focused on lasso predictive weights, as it provided the most parsimonious models. A total of 170,340 CpG sites had a mean cross-validated partial R2 for predictive DNAm above 1% and 45,965 above 10%. Beyond that the numbers decreased rapidly (Figure 4), with only 4,422 having a partial R2 above 50%. The mean and median partial R2 were 9.8% and 5.4%, respectively. All genetic weights are available upon request from the authors.

An external file that holds a picture, illustration, etc.
Object name is nihms-1786408-f0004.jpg

Histogram of the partial R2 of the predicted heritable component from lasso of DNAm.

Novel Multi-SNP meQTLs with CRC

We ran sMiST on 168,963 CpG sites with partial R2 ≥ 0.01, after removing 1,377 CpG sites where all selected SNPs had MAF < 1% in the GWAS summary statistics. A total of 853 CpG sites passed the significance threshold (0.05/168,963 ≈ 2.96e-07, Figure S3). After adjusting for the known loci, 13 CpG sites were still significant. Of these 13, 3 CpG sites cg10226744 (22: 46068542, ATXN10), cg05743278 (7:129709803, KLHDC10), and cg19037357 (7: 130081024, CEP41) had no CRC known loci within 1 Mb (Table 2). In our sensitivity analysis, removing GECCO from the GWAS summary statistics did not change the result (Table S7).

Table 2:

Multi-SNP meQTL significant associations with CRC risk based on sMiST after adjusting for known GWAS loci

CpGGeneNSNP1R22Unadj P-value3Cond P-value4Burden P-value5Random P-value6Nearest Known CRC SNP7
cg18541609 (5:430359) AHRR260.0821.50E-072.50E-053.90E-034.40E-04rs78368589
cg23330450 (6:31548285) 560.1222.00E-173.30E-055.10E-044.70E-03rs2516420
cg14426347 (6:31632827) BAT4/CSNK2B200.0427.60E-131.70E-053.60E-053.10E-02rs2516420
cg13653456 (6:31763819) VARS200.0743.80E-103.70E-064.70E-054.80E-03rs2516420
cg23945481 (6:31938984) STK19/DOM3Z140.0534.20E-115.50E-051.30E-023.00E-04rs3830041
cg22767321 (6:32143681) AGPAT1120.0431.20E-082.40E-061.90E-067.60E-02rs3830041
cg07588430 (6:41561444) FOXP440.0145.80E-103.80E-082.20E-078.30E-03rs62396735
cg05743278 (7:129709803) KLHDC1050.0311.40E-078.50E-073.30E-061.40E-02---
cg19037357 (7:130081024) CEP41170.0671.60E-072.30E-066.10E-012.30E-07---
cg07005513 (11:61595956) FADS2100.0763.70E-111.00E-065.50E-021.10E-06rs174533
cg18213653 (12:112605734) C12orf5170.0162.10E-082.20E-092.40E-033.80E-08rs597808
cg19193595 (15:67396487) SMAD350.0224.40E-142.30E-056.50E-012.50E-06rs56324967
cg10226744 (22:46068542) ATXN1070.0931.10E-072.80E-091.60E-047.30E-07---
1Number of SNPs selected by the LASSO model.
2Partial R2 of DNAm explained by genetic variants adjusting for covariates.
3sMiST p-value not adjusting for known GWAS loci.
4sMiST p-value adjusting for all known GWAS loci.
5p-value for the conditional burden component (predicted DNAm) adjusting for all known GWAS loci.
6p-value for the conditional random effects component adjusting for all known GWAS loci.
7Nearest known GWAS SNP (based on S1 Table).

Upon closer examination, for most CpG sites, both the genetically predicted CpG component and the random effects component contributed to the overall association, highlighting the power of sMiST for testing the total effects of meQTLs and the idea that the SNPs may contribute to CRC risk beyond the CpG site considered. The most significant conditional association was cg18213653 (C12orf51) on chromosome 12 (overall p-value = 2.23e-09) with burden (genetically predicted CpG) p-value = 2.42e-03 and random effects p-value = 3.82e-08. The association signal of the novel CpG site cg05743278 (KLHDC10) was mainly driven by the predicted methylation (p-value 3.3e-06). A total of 5 meQTLs in this CpG explained 3% of the variation. On the other hand, cg19037357, cg07005513, cg19193595 were mainly driven by the random effects, suggesting potential individual variant effects in addition to the effect of the predicted methylation effect. Interestingly, we did not observe much correlation amongst the individual DNAm of these 13 sites (Figure S4).

As a follow-up analysis, we examined the sMiST results of the two CpG sites from the meQTL CRC. The meQTL rs2516600 was selected as one of the 11 SNPs in the lasso regression, which together strongly predicted methylation of cg20550154 (partial R2 = 0.72). The sMiST p-value for this CpG site is 1.04e – 05 with the p-value for predicted methylation 6.9e-07 and the random effects p-value 0.99, suggesting the CRC association was likely driven by methylation. The top meQTL of the other novel CpG site, cg26923084, was also selected as one of cg26923084’s 28 SNPs by lasso. The sMiST for this CpG site had an overall p-value of 4.2e-07 with the p-value for predicted methylation 2.29e-06 and for random effects 0.01, indicating the association was also likely driven by methylation.

Discussion

Herein, we provided a systematic examination of the relationship between genetic variants and DNAm and their subsequent association with CRC risk. We used two complementary approaches: (1) examining pairwise association between SNPs and DNAm and then testing whether the significant meQTLs were associated with CRC risk; and (2) examining joint association of cis-variants with DNAm and using these weights to test the overall association of predicted methylation and residual variant effects with CRC risk via sMiST. The single-SNP meQTL approach revealed a novel CRC risk locus and that about one third of known loci were meQTLs. The multi-SNP approach revealed three novel loci, as well as several novel secondary signals of regions within 1Mb of known loci.

In our single-SNP meQTL approach, the two novel SNPs, rs2516600 and rs2749870, are in high LD and likely represent one signal. However, these two SNPs are not the most associated SNPs in the cis-region of CpG sites and rs1051069 is (Figure S5, CRC p-value = 6.59e-08). The CpG sites for which rs2516600 and rs2749870 are top meQTLs both are mapped to Nidogen 2 (NID2). Interestingly, these two CpG sites are located within two different regions of the NID2 gene, cg20550154 is within the gene body while cg26923084 is within 1500 bp of the transcription start site. NID2 may be involved in maintaining the structure of the basement membrane (33). The encoded cell-adhesion protein NID2 binds to laminins (33). Members of the laminin family, including LAMA5 and LAMC1, have been implicated in CRC risk by GWAS (34). The NID2 promotor is aberrantly methylated in human gastrointestinal cancers, including colon cancer, and methylation has been shown to silence gene expression (35). NID2 gene methylation is also robustly associated with bladder cancer and urine-based methylation biomarker panels containing NID2 have been proposed for the early detection of primary bladder cancer (36,37).

In our multi-SNP approach, we found an association with cg10226744, mapped to the body of the ATXN10 gene. The Ataxin 10 (ATXN10) gene is expressed in most tissues and identified as a downstream effector of the p53 and pRB tumour suppressor pathways involved in cellular senescence (38). Circumventing senescence has been proposed to represent an essential step in tumor progression (39). Four of the seven lasso selected variants for cg10226744 are within ATXN10, two near FAM118A, and one near WNT7B. WNT7B is a member of the WNT pathway, which plays a major role in colorectal cancer signaling (40).

Both CpG sites cg05743278 and cg19037357 from the multi-SNP analysis are mapped to the 7q32.2 loci. This is a known imprinted region, with associations with basal-cell carcinoma and type-2- diabetes (41). cg05743278 is within 1500bp of the transcription start site of KLHDC10 and cg19037357 is within 1500bp of the transcription start site of CEP41. One SNP (rs10277874) was selected for both CpG sites by lasso. KLHDC10 has been found to be a suppressor of PP5 (42). PP5 is an alias for TFPI2 which is a methylation biomarker for CRC (43,44). CEP41, though not yet implicated in CRC, has been found to express in gastrointestinal tissues (45). Besides KLHDC10 and CEP41, gene KLF14, where nearby SNPs were selected for cg05743278 and cg19037357, may also be a candidate, as it has been found to be associated with CRC (46).

The 70,106 unique CpG-SNP associations after LD pruning was made up of 55,209 unique SNPs. Of these 46,254 were associated with only one CpG. Of the remaining 8,955 that were associated with more than one CpG sites, 8,576 were only associated with cis CpG sites, making it difficult to ascertain whether the associations were hotspots or due to correlation amongst the nearby CpG sites. Amongst the remaining SNPs, 12 were meQTLs for at least five CpG sites on different chromosome. The SNP rs11673023 was also reported by Huan et al (2019) as being in hotspot 22 (H22) of their trans-meQTL hotspots (47). In addition, three variants were within the region of the previously reported hotspot 14 (47,48).

While we have found multiple potentially novel loci, there are several limitations to our study. First, to increase the sample size for determining joint associations of SNPs with DNAm, we used the entire methylation dataset to infer meQTLs. The weights from the penalized regression model are likely biased (49). Ideally, the weights would be re-estimated using an independent validation dataset. However, the power loss for identifying novel loci with CRC risk due to biased weights is likely minimal as a scale shift for genetically predicted methylation does not affect testing. Second, our participants are all of the European ancestry. It is well-recognized that many of these SNPs tag causal variants and due to LD and minor allele differences between populations the effects of these SNPs may not be generalizable to other populations. It is important to validate these results in other racial and ethnic populations. Third, the DNAm was collected in whole blood as opposed to the relevant colon and rectal tissues. However, it has been observed that cross-tissue meQTL effects are prevalent (50,51), which, to some extent, are similar to gene expression quantitative trait loci (52). We also observed that gene regulatory elements of immune cell types from blood are enriched for variants with colorectal cancer association (53). Interestingly, when we examined the gene expression of the three novel associations (NID2, KLHDC10, and CEP41) there was higher gene expression in the colon tissue compared to whole blood (Figure S6S9). Regardless, future works for identifying meQTLs in the relevant colon and rectal tissues and their associations with colorectal cancer risk are needed.

In our meQTL analyses we adjusted for case-control status, which may introduce collider bias. We examined the association of novel CRC with methylation at cg20550154 and cg26923084 ignoring case-control status or restricting the analysis to just controls, the associations did not alter however (Table S8). Further, there may be differential meQTL effects between cases and controls. Towards this end, we assessed the 3,737,032 unique associations from our discovery for case/control-SNP interaction in a subsample of cases-controls where cases and controls were on the same plate. Despite diagnosis/treatment potentially impacting DNAm, only 615 associations had a significant interaction effect (.05/3,737,032), and none overlapped the 70 associations detected in the single-SNP meQTL CRC GWAS. The limited number of interaction effects identified is likely due to low power. We further performed separate cis-meQTL analysis for cases and controls separately. The p-values of SNP-DNAm associations were highly correlated between cases and controls, suggesting meQTLs are generally shared between cases and controls (Figure S10). Thus, taking a combined analysis of cases and controls will have the most power in identifying these shared meQTLs. There were only 979 SNP-DNAm (after LD pruning) that reached our previous discovery cutoff (alpha = 2.9e-11) in cases but showed no association in controls (p-value less than 2.9e-11 in cases but greater than 0.5 in controls), and 1,393 SNP-DNAm that showed associations in controls but not cases, 15 of 979 and 103 of 1,393 of which were associated in our combined analysis. None of the SNPs were known CRC loci. To maximize the utility of the data we studied common effects across cases and controls; studies with larger sample size of cases and controls will be needed to study potentially differential effects.

Our study has many strengths. These include large sample sizes for GWAS CRC and methylation; use of novel powerful statistical methods to account for the total effect of both genetically predicted methylation and individual variant effects via alternative pathways; and the first large-scale examination of methylation association with CRC risk.

In summary, we have used a novel framework for utilizing DNAm data to discover novel loci as well as secondary signals associated with CRC risk. The findings could elucidate novel pathways in CRC tumorigenesis. These findings, in particular, the novel loci NID2, ATXN10, KLHDC10 and CEP41, warrant follow-up in future functional studies. Finally, we applied the summary sMiST analysis, a powerful statistical method to combine both information from the effect through methylation and residual direct effects of the meQTLs. We hope this paper will motivate researchers to utilize functional genomic data such as DNA methylation or gene expression and large-scale GWAS summary statistics with these statistical tools to discover new genetic associations.

Supplementary Material

2

Acknowledgments

The authors thank the participants across all studies. In addition, we thank the reviewers for valuable feedback. This project was a part of Richard Barfield’s post-doctoral work at Fred Hutch. L. Hsu was partially funded by the NIH/NCI Grant R01 CA189532. Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO): National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA164930, U01 CA137088, R01 CA059045, R01 CA201407) (U. Peters). Analyses were run on the Fred Hutch cluster, which is part of the Scientific Computing Infrastructure funded by ORIP grant S10OD028685. A full list of study-specific acknowledgments is provided in the supplement.

Conflict of Interest Disclosure Statement:

Marios Giannakis receives research funds from Merck, Bristol-Myers Squibb, Servier, and Janssen unrelated to this research. Other authors have declared that no competing interests exist.

References

1. Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet. 2019; [Europe PMC free article] [Abstract] [Google Scholar]
2. Law PJ, Timofeeva M, Fernandez-Rozadilla C, Broderick P, Studd J, Fernandez-Tajes J, et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat Commun. 2019; [Europe PMC free article] [Abstract] [Google Scholar]
3. Okugawa Y, Grady WM, Goel A. Epigenetic Alterations in Colorectal Cancer: Emerging Biomarkers. Gastroenterology. 2015; [Europe PMC free article] [Abstract] [Google Scholar]
4. Ashktorab H, Brim H. DNA Methylation and Colorectal Cancer. Curr. Colorectal Cancer Rep 2014. [Europe PMC free article] [Abstract] [Google Scholar]
5. Weisenberger DJ, Liang G, Lenz HJ. DNA methylation aberrancies delineate clinically distinct subsets of colorectal cancer and provide novel targets for epigenetic therapies. Oncogene. 2018. [Europe PMC free article] [Abstract] [Google Scholar]
6. Ehrlich M DNA hypermethylation in disease: mechanisms and clinical relevance. Epigenetics. 2019. [Europe PMC free article] [Abstract] [Google Scholar]
7. Gündert M, Edelmann D, Benner A, Jansen L, Jia M, Walter V, et al. Genome-wide DNA methylation analysis reveals a prognostic classifier for non-metastatic colorectal cancer (ProMCol classifier). Gut. 2019; [Abstract] [Google Scholar]
8. Neumeyer S, Popanda O, Edelmann D, Butterbach K, Toth C, Roth W, et al. Genome-wide DNA methylation differences according to oestrogen receptor beta status in colorectal cancer. Epigenetics. 2019; [Europe PMC free article] [Abstract] [Google Scholar]
9. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JPJ. CpG island methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A. 1999; [Europe PMC free article] [Abstract] [Google Scholar]
10. Jia M, Gao X, Zhang Y, Hoffmeister M, Brenner H. Different definitions of CpG island methylator phenotype and outcomes of colorectal cancer: a systematic review. Clin. Epigenetics 2016. [Europe PMC free article] [Abstract] [Google Scholar]
11. Advani SM, Advani PS, Brown DW, DeSantis SM, Korphaisarn K, VonVille HM, et al. Global differences in the prevalence of the CpG island methylator phenotype of colorectal cancer. BMC Cancer [Internet]. 2019;19:964. Available from: 10.1186/s12885-019-6144-9 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
12. Koch A, Joosten SC, Feng Z, De Ruijter TC, Draht MX, Melotte V, et al. Analysis of DNA methylation in cancer: Location revisited. Nat. Rev. Clin. Oncol 2018. [Abstract] [Google Scholar]
13. Freytag V, Vukojevic V, Wagner-Thelen H, Milnik A, Vogler C, Leber M, et al. Genetic estimators of DNA methylation provide insights into the molecular basis of polygenic traits. Transl Psychiatry. 2018; [Europe PMC free article] [Abstract] [Google Scholar]
14. Yang Y, Wu L, Shu X, Lu Y, Shu XO, Cai Q, et al. Genetic data from nearly 63,000 women of European descent predicts DNA methylation biomarkers and epithelial ovarian cancer risk. Cancer Res. 2019; [Europe PMC free article] [Abstract] [Google Scholar]
15. Yang Y, Wu L, Shu XO, Cai Q, Shu X, Li B, et al. Genetically Predicted Levels of DNA Methylation Biomarkers and Breast Cancer Risk: Data From 228 951 Women of European Descent. J Natl Cancer Inst. 2020; [Europe PMC free article] [Abstract] [Google Scholar]
16. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV., Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015; [Europe PMC free article] [Abstract] [Google Scholar]
17. Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018; [Europe PMC free article] [Abstract] [Google Scholar]
18. Gusev A, Mancuso N, Won H, Kousi M, Finucane HK, Reshef Y, et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat Genet [Internet]. 2018/April/11. 2018;50:538–48. Available from: https://www.ncbi.nlm.nih.gov/pubmed/29632383 [Europe PMC free article] [Abstract] [Google Scholar]
19. Su YR, Di C, Bien S, Huang L, Dong X, Abecasis G, et al. A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. Am J Hum Genet. 2018; [Europe PMC free article] [Abstract] [Google Scholar]
20. Sun J, Zheng Y, Hsu L. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies. Genet Epidemiol. 2013; [Europe PMC free article] [Abstract] [Google Scholar]
21. Dong X, Su YR, Barfield R, Bien SA, He Q, Harrison TA, et al. A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet. 2020; [Europe PMC free article] [Abstract] [Google Scholar]
22. Lemire M, Zaidi SHE, Ban M, Ge B, Aïssi D, Germain M, et al. Long-range epigenetic regulation is conferred by genetic variation located at thousands of independent loci. Nat Commun. 2015; [Europe PMC free article] [Abstract] [Google Scholar]
23. Lemire M, Zaidi SHE, Zanke BW, Gallinger S, Hudson TJ, Cleary SP. The effect of 5-fluorouracil/leucovorin chemotherapy on CpG methylation, or the confounding role of leukocyte heterogeneity: An illustration. Genomics. 2015; [Europe PMC free article] [Abstract] [Google Scholar]
24. Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, et al. Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology. 2013; [Europe PMC free article] [Abstract] [Google Scholar]
25. Schmit SL, Edlund CK, Schumacher FR, Gong J, Harrison TA, Huyghe JR, et al. Novel common genetic susceptibility loci for colorectal cancer. J Natl Cancer Inst. 2019; [Europe PMC free article] [Abstract] [Google Scholar]
26. Schumacher FR, Schmit SL, Jiao S, Edlund CK, Wang H, Zhang B, et al. Genome-wide association study of colorectal cancer identifies six new susceptibility loci. Nat Commun. 2015; [Europe PMC free article] [Abstract] [Google Scholar]
27. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics [Internet]. 2010;11:587. Available from: https://www.ncbi.nlm.nih.gov/pubmed/21118553 [Europe PMC free article] [Abstract] [Google Scholar]
28. Shabalin AA. Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; [Europe PMC free article] [Abstract] [Google Scholar]
29. Mogil LS, Andaleon A, Badalamenti A, Dickinson SP, Guo X, Rotter JI, et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 2018; [Europe PMC free article] [Abstract] [Google Scholar]
30. Wheeler HE, Shah KP, Brenner J, Garcia T, Aquino-Michaels K, Cox NJ, et al. Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues. PLoS Genet. 2016; [Europe PMC free article] [Abstract] [Google Scholar]
31. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; [Europe PMC free article] [Abstract] [Google Scholar]
32. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet [Internet]. 2014;10:e1004383. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24830394 [Google Scholar]
33. Kohfeldt E, Sasaki T, Göhring W, Timpl R. Nidogen-2: A new basement membrane protein with diverse binding properties. J Mol Biol. 1998; [Google Scholar]
34. Peters U, Bien S, Zubair N. Genetic architecture of colorectal cancer. Gut. 2015; [Google Scholar]
35. Ulazzi L, Sabbioni S, Miotto E, Veronese A, Angusti A, Gafà R, et al. Nidogen 1 and 2 gene promoters are aberrantly methylated in human gastrointestinal cancer. Mol Cancer. 2007; [Google Scholar]
36. Renard I, Joniau S, van Cleynenbreugel B, Collette C, Naômé C, Vlassenbroeck I, et al. Identification and Validation of the Methylated TWIST1 and NID2 Genes through Real-Time Methylation-Specific Polymerase Chain Reaction Assays for the Noninvasive Detection of Primary Bladder Cancer in Urine Samples. Eur Urol. 2010; [Abstract] [Google Scholar]
37. Fantony JJ, Longo TA, Gopalakrishna A, Owusu R, Lance RS, Foo WC, et al. Urinary NID2 and TWIST1 methylation to augment conventional urine cytology for the detection of bladder cancer. Cancer Biomarkers. 2017; [Abstract] [Google Scholar]
38. Rovillain E, Mansfield L, Lord CJ, Ashworth A, Jat PS. An RNA interference screen for identifying downstream effectors of the p53 and pRB tumour suppressor pathways involved in senescence. BMC Genomics. 2011; [Europe PMC free article] [Abstract] [Google Scholar]
39. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000. [Abstract] [Google Scholar]
40. Bienz M, Clevers H. Linking colorectal cancer to Wnt signaling. Cell. 2000. [Abstract] [Google Scholar]
41. Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, et al. Parental origin of sequence variants associated with complex diseases. Nature. 2009; [Europe PMC free article] [Abstract] [Google Scholar]
42. Sekine Y, Hatanaka R, Watanabe T, Sono N, Iemura S ichiro, Natsume T, et al. The Kelch Repeat Protein KLHDC10 Regulates Oxidative Stress-Induced ASK1 Activation by Suppressing PP5. Mol Cell. 2012; [Abstract] [Google Scholar]
43. Glöckner SC, Dhir M, Joo MY, McGarvey KE, Van Neste L, Louwagie J, et al. Methylation of TFPI2 in stool DNA: A potential novel biomarker for the detection of colorectal cancer. Cancer Res. 2009; [Europe PMC free article] [Abstract] [Google Scholar]
44. Hibi K, Goto T, Shirahata A, Saito M, Kigawa G, Nemoto H, et al. Detection of TFPI2 methylation in the serum of colorectal cancer patients. Cancer Lett. 2011; [Abstract] [Google Scholar]
45. Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Tissue-based map of the human proteome. Science (80-). 2015; [Abstract] [Google Scholar]
46. Zhou J, Lin J, Zhang H, Zhu F, Xie R. LncRNA HAND2-AS1 sponging miR-1275 suppresses colorectal cancer progression by upregulating KLF14. Biochem Biophys Res Commun. 2018; [Abstract] [Google Scholar]
47. Huan T, Joehanes R, Song C, Peng F, Guo Y, Mendelson M, et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun [Internet]. Nature Publishing Group; UK; 2019;10:4267. Available from: https://pubmed.ncbi.nlm.nih.gov/31537805 [Europe PMC free article] [Abstract] [Google Scholar]
48. Shi J, Marconett CN, Duan J, Hyland PL, Li P, Wang Z, et al. Characterizing the genetic basis of methylome diversity in histologically normal human lung tissue. Nat Commun [Internet]. 2014;5:3365. Available from: https://pubmed.ncbi.nlm.nih.gov/24572595 [Europe PMC free article] [Abstract] [Google Scholar]
49. Tibshirani R Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Ser B. 1996; [Google Scholar]
50. Lin D, Chen J, Perrone-Bizzozero N, Bustillo JR, Du Y, Calhoun VD, et al. Characterization of cross-tissue genetic-epigenetic effects and their patterns in schizophrenia. Genome Med [Internet]. 2018;10:13. Available from: 10.1186/s13073-018-0519-4 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
51. Liu J, Lin D, Chen J, Perrone-Bizzozero N, Calhoun V, Bustillo J. 29 - CHARACTERIZATION OF CROSS-TISSUE mQTL EFFECTS AND THEIR RELEVANCE IN PSYCHIATRIC DISORDERS. Eur Neuropsychopharmacol [Internet]. 2019;29:S796–7. Available from: https://www.sciencedirect.com/science/article/pii/S0924977X17304820 [Google Scholar]
52. Aguet F, Brown AA, Castel SE, Davis JR, He Y, Jo B, et al. Genetic effects on gene expression across human tissues. Nature [Internet]. 2017;550:204–13. Available from: 10.1038/nature24277 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
53. Bien SA, Auer PL, Harrison TA, Qu C, Connolly CM, Greenside PG, et al. Enrichment of colorectal cancer associations in functional regions: Insight for using epigenomics data in the analysis of whole genome sequence-imputed GWAS data. PLoS One [Internet]. Public Library of Science; 2017;12:e0186518. Available from: 10.1371/journal.pone.0186518 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

Citations & impact 


Impact metrics

Jump to Citations

Alternative metrics

Altmetric item for https://www.altmetric.com/details/124544577
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/124544577

Smart citations by scite.ai
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by EuropePMC if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1158/1055-9965.epi-21-0724

Supporting
Mentioning
Contrasting
0
1
0

Article citations

Data 


Data behind the article

This data has been text mined from the article, or deposited into data resources.

Similar Articles 


To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.


Funding 


Funders who supported this work.

NCI NIH HHS (7)

NCI NIH U.S. Department of Health and Human Services (4)

NIH HHS (1)

NIH NCI (1)

ORIP (1)

World Health Organization (1)