Abstract
Free full text
Genome-wide Association Study in a High-Risk Isolate for Multiple Sclerosis Reveals Associated Variants in STAT3 Gene
Abstract
Genetic risk for multiple sclerosis (MS) is thought to involve both common and rare risk alleles. Recent GWAS and subsequent meta-analysis have established the critical role of the HLA locus and identified new common variants associated to MS. These variants have small odds ratios (ORs) and explain only a fraction of the genetic risk. To expose potentially rare, high-impact alleles, we conducted a GWAS of 68 distantly related cases and 136 controls from a high-risk internal isolate of Finland with increased prevalence and familial occurrence of MS. The top 27 loci with p < 10−4 were tested in 711 cases and 1029 controls from Finland, and the top two findings were validated in 3859 cases and 9110 controls from more heterogeneous populations. SNP (rs744166) within the STAT3 gene was associated to MS (p = 2.75 × 10−10, OR 0.87, confidence interval 0.83–0.91). The protective haplotype for MS in STAT3 is a risk allele for Crohn disease, implying that STAT3 represents a shared risk locus for at least two autoimmune diseases. This study also demonstrates the potential of special isolated populations in search for variants contributing to complex traits.
Main Text
Multiple sclerosis (MS) (MIM #126200) is a complex inflammatory disease of the central nervous system with presumed autoimmune etiology. Both environmental and genetic factors are thought to contribute to the development of MS,1–3 and the genetic risk factors likely include both common and rare risk alleles. Recent GWAS and subsequent meta-analysis have established the critical role of the HLA locus4–6 and identified new MS loci: IL2RA (MIM 147730),7 IL7R (MIM 146661),7–9 CLEC16A (MIM 611303),7,10–13 CD58 (MIM 153420),11,12,14 TNFRSF1A (MIM 191190),15 IRF8 (MIM 601565),15 and TYK2 (MIM 176941).12,16,17 These associated variants, except for TYK2, are common, have small odds ratios (ORs), and explain only a fraction of the genetic risk.
The population history of Finland and the province of Southern Ostrobothnia (SO), an internal isolate with increased prevalence of MS,18–22 is compatible with a founder effect.22–24 Previous studies in Finnish MS families originating from this high-risk subisolate have demonstrated linkage and association to the HLA locus (HLA-DRB1 [MIM 142857]),25–27 17q22-24,25,28,29 and 5p14-p12.25,30–32 Therefore, we hypothesized that some variants predisposing to MS have either become enriched in SO or can be more easily detected against a homogenous background with a genome-wide, high-density SNP screen. We looked for shared alleles enriched in cases, as well as potential extended homozygous regions and copy number variations (CNVs) enriched in MS cases.
We included in our GWAS 72 cases with either both parents from the high-risk isolate or one parent from the isolate and positive family history of MS and genotyped them with the Illumina HumanHap300 chip. Extensive genealogical research revealed that the majority of the cases could be traced to two large interrelated pedigrees (see Figure S1 available online). A total of 2206 population-based controls were genotyped with either Illumina HumanHap300 chip or with Illumina HumanHap610-quad chip. We excluded samples and SNPs with <95% success rates, leaving 72 cases and 2196 controls for the subsequent analyses, and selected only SNPs present on both Illumina platforms (297,343 SNPs) for analyses. Gender check was performed with X chromosomal SNPs, and no discrepancies between the observed and expected gender were noted. Identity-by-descent (IBD) analysis was performed to study possible close cryptic relatedness between individuals and to identify possible samples with excess relatedness, suggestive of sample contamination. We then performed identity-by-state (IBS) and multidimensional scaling analyses: four cases were initially considered as isolate samples clustered outside the isolate sample set and were excluded from subsequent analyses (Figure S2). We selected the two closest IBS-matched controls for each case, and the final GWAS set (isolate GWAS) consisted of 68 cases and 136 controls. The genomic inflation factor suggested no major inflation (λ 1.078) and a fairly well-matched case-control set, which was also confirmed by quantile-quantile plot analysis of single SNP association results (Figure S3). Because we had parental birthplace data for both the cases and the majority of controls (n = 2174), we could further verify that all cases and 125 of the 136 selected controls had at least one parent born in Southern Ostrobothnia, and of these, 64 cases and 90 controls had both parents born in Southern Ostrobothnia. We have recently shown a correlation between geographical origin of samples (based on parental birthplace information) and genome-wide SNP data in the Finnish population.23 Thus, IBS matching of cases and controls combined with genealogical information should minimize the risk of population substructure in our study set. All patient samples were collected with informed consent, and the study design and the Finnish sample collection have been approved by the Helsinki University Hospital Ethical Committee of Ophthalmology, Otorhinolaryngology, Neurology and Neurosurgery (permit 192/E9/02).
Taking advantage of the distant relatedness in the subisolate, we conducted homozygosity analyses with PLINK.33 First we searched for extended regions of homozygosity (ROHs), the signature features of isolated populations, enriched in MS cases to identify loci that could influence MS susceptibility in a recessive manner. ROHs with at least 50 consecutive SNPs and a minimum length of 500 kb were identified in each individual. On average, we identified 149 (standard deviation: 12 in cases, 10 in controls) ROHs per individual with an average length of 1030 kb (500 kb–31.3 Mb) in cases and 1018 kb (500 kb–49.6 Mb) in controls. We then evaluated which overlapping homozygous regions were enriched in cases by permuting the group (case-control) labels 10,000 times. The analysis revealed three putative regions with empirical p < 10−3: 1q42.12 (242 kb, 24 SNPs, p = 3 × 10−4), 2q24.3 (512 kb, 39 SNPs, p = 8 × 10−5), and 12q24.33 (573 kb, 48 SNPs, p = 3 × 10−4) (Table S1 and Figure S4). Although the cases and controls are matched on the basis of their genome-wide IBS sharing and are augmented by parental birthplace information, the permutation-based approach is susceptible to population substructure, and obtained p values should be interpreted with caution. Excess homozygous sharing was observed with the same haplotype for 13% (9 of MS cases) and 7% (10 of controls) for 1q42.12 and 37% (25 of MS cases) and 20% (27 of controls) for 2q24.3. For the 12q24 region, we observed multiple different haplotypes (Table S1). These regions have not been previously implicated in MS except for suggestive linkage in 12q23-24,34 and their putative role in MS susceptibility requires further validation. Haplotype sharing outside of the isolate in the population control samples (n = 2194) was similar to the GWAS internal isolate control population (frequencies 5.8% for 1q42.12 and 20% for 2q24.3 haplotypes). This indicates that the homozygous haplotypes have been enriched in the subisolate MS cases, but not in the isolate controls, although the IBD analysis showed the isolate controls to be as related to each other as the isolate MS cases (Table S2).
The Illumina HumanHap300 platform has relatively sparse coverage and is void of probes in the most common CNV regions but could be suitable for detecting rare, large CNVs, potentially enriched in the internal isolate population. We used the QuantiSNP software35 for CNV detection (GC content correction option, restricted to CNVs with log Bayesian factor > 10 and length ≥ 3 SNPs) and verified these results visually with Bead Studio 3.3. All CNVs in centromeric regions were excluded. We identified altogether 106 CNV regions in 68 cases (Table S3); all but 6 of the 106 CNVs have been previously reported. Furthermore, all novel CNVs were found in only one case each. Hypothesizing that genes mapping next to the 106 CNVs identified in cases could belong to a common pathway involved in MS etiology, we used Ingenuity Pathway Analysis to search for connecting pathways. One pathway potentially regulating oligodendrocyte differentiation and myelin sheet formation36–41 involving NRG3 (MIM 605533), ERBB4 (MIM 600543), DLG2 (MIM 603583), UTRN (MIM 128240), and LARGE (MIM 603590) (all CNVs previously reported) was identified (Figure S5), but CNV deletions in these genes were observed to have similar frequency in MS cases compared to controls with Fisher's exact test (ERBB4: 11% of cases and 12% of controls, p = 0.388; NRG3: 4% of cases and 4% of controls, p = 0.90; and DLG2: 1% of cases and 0% of controls, p = 0.404) when genotyped in an independent set of 703 cases and 1051 controls with an in-house-developed PCR-based fragment analysis method.42
Southern Ostrobothnia is an old isolate, and thus the expected shared haplotypes are of modest length. We therefore performed single SNP standard χ2 allelic association analysis with PLINK.33 Because of the limited power, we analyzed all 27 loci (28 of the 37 initial SNPs) showing nominal association in the GWAS analysis (p < 10−4; Table S4) in a larger independent Finnish sample set of 711 cases and 1029 controls, of which 83 MS cases and 365 controls were from the isolate (Table 1). Population-stratified Cochran-Mantel-Haenszel (CMH) association analysis provided evidence for three SNPs: rs3135338 in the HLA region (p = 1.6 × 10−25), rs744166 in first intron of STAT3 (MIM 102582) in chromosome 17q21.1 (p = 0.0012), and rs1364194 in chromosome 16 (p = 0.0047) (Table S4). The non-HLA SNPs were then analyzed in an international sample of 3859 MS cases and 9110 controls from six different populations (Table 1). The combined evidence for association to STAT3 (rs744166) was significant (p = 2.75 × 10−10 and OR 0.87 [95% confidence interval (CI) 0.83–0.91]) (Figure 1; Table S5). The Breslow-Day analysis of heterogeneity of odds ratios revealed no significant heterogeneity (p = 0.34). When the combined replication data set was analyzed by logistic regression for additive, dominant, and recessive models with study set as a covariate in the analyses, the statistically most significant p value was obtained for the additive model (Table S6). We obtained no additional support for the chromosome 16q region.
Table 1
Study Population | Number of MS Cases | Number of Controls | Genotyping Platform |
---|---|---|---|
Southern Ostrobothnia (SO) isolate GWASa | 68 | 136 | Illumina HumanHap300 or Illumina Human610-quad |
Finland SO replicationb | 83 | 365 | Sequenom iPlex Gold |
Finlandc | 628 | 668 | Sequenom iPlex Gold |
Norwayd | 607 | 816 | Sequenom iPlex Gold |
Denmarke | 628 | 1074 | Sequenom iPlex Gold |
Gene MSA Switzerlandf | 253 | 208 | Illumina HumanHap300 |
Gene MSA Netherlandsf | 230 | 232 | Illumina HumanHap300 |
Gene MSA USf | 486 | 431 | Illumina HumanHap300 |
IMSGC UKg | 453 | 2950 | Affymetrix 500K |
IMSGC USg | 342 | 1679 | Affymetrix 500K |
BWHg | 860 | 1720 | Affymetrix 6.0 |
Total | 4638 | 10,279 |
All samples have been diagnosed with clinically definite MS according to either Poser's or McDonald's criteria.
Evaluation of the STAT3 linkage disequilibrium (LD) block that contains the associated SNP rs744166 in Hapmap2 (build 23a) samples43 with Haploview 4.044 showed that rs744166 non-risk-associated A allele completely tags the most common haplotype in Southern Utah residents of European descent (CEU) (56%), Han Chinese from Beijing (CHB 65%), and Tokyo Japanese (JPT 57%), but the G allele is present on four different haplotypes (Table S7). In the Yoruban population from Nigeria (YRI), the A allele is present on four different haplotype backgrounds, and the most common A haplotype in CEU, CHB, and JPT populations has the frequency of 7% in the YRI population. We speculate that this notable enrichment of a single haplotype in non-African populations might suggest positive selection of the putative MS protective haplotype outside Africa, although this locus did not reach genome-wide significance in an analysis of signs of recent positive selection.45 The rs744166 A allele also shows changes in frequency distribution in the Human Genome Diversity Panel (Figure S6).45 The LD block carrying the haplotype is 54 kb in length in the CEU population and contains the beginning of STAT3 and its immediate promoter region (Figure 2).
We tagged the haplotypes with three SNPs (rs744166, rs6503695, and rs957970) with Haploview 4.0 tagging option. These SNPs were genotyped in the Finnish sample set, and the data for the same SNPs were available from four other populations from a recent meta-analysis.7,15,46 We phased the haplotypes with PLINK and performed a CMH analysis with populations as clusters. We could define both a putative predisposing haplotype (30.9% in MS, 27.1% in controls, OR 1.18, 95% CI 1.11–1.27) with CMH p = 1.29 × 10−6 and a tentative protective haplotype (55.0% MS, 58.7% controls, OR 0.86, 95% CI 0.81–0.91) (Figure 1) with CMH combined p = 1.19 × 10−6 (Table 2; Tables S7 and S8). The Breslow-Day test revealed no significant heterogeneity of odds ratios (p = 0.271 and p = 0.301, respectively). Further studies, including resequencing, will be needed to identify the true affecting variants segregating in one or both of these haplotypes.
Table 2
Haplotype | Frequency MS (n = 3255) | Frequency Control (n = 8133) | P Value | OR | 95% CI |
---|---|---|---|---|---|
CGGa | 0.309 | 0.271 | 1.29 × 10−6 | 1.18 | (1.11–1.27) |
TAAb | 0.550 | 0.587 | 1.19 × 10−6 | 0.86 | (0.81–0.91) |
TGG | 0.082 | 0.080 | 0.439 | 1.06 | (0.94–1.17) |
CGA | 0.057 | 0.059 | 0.591 | 0.98 | (0.85–1.10) |
The haplotypes were constructed with SNPs rs6503695, rs744166, and rs957970 and phased with PLINK. Only phased haplotypes with posterior probability of 1 were included in the analysis. Each haplotype was analyzed separately and showed no evidence for heterogeneity of odds ratios between populations in the Breslow-Day test, which allowed us to combine the haplotype results with CMH. The analysis included a total of 3255 MS cases and 8133 controls from Finnish, BWH, IMSGC UK, IMSGC US, Gene MSA US, Gene MSA CH, and Gene MSA NL sample sets. The results for individual populations are provided in Table S5.
STAT3 codes for a transcription factor that is involved in multiple pathways and functions, including the Jak-STAT pathway, neuron axonal guidance, apoptosis, activation of immune responses, and Th17 cell differentiation.47 Interestingly, the A allele of rs744166 tagging the MS-protective haplotype is associated with Crohn disease,48 and mutations in STAT3 are known to cause hyperimmunoglobulin E recurrent infection syndrome (HIES [MIM #147060]),49,50 a rare autosomal-dominant disorder characterized by elevated immunoglobulin E levels and inflammation. Additionally, mouse studies have shown that targeted deletion of Stat3 in CD4+ T cells prevents the development of experimental autoimmune encephalomyelitis (EAE), the rodent model of MS,51 and that Treg-specific ablation of Stat3 resulted in the development of a fatal intestinal inflammation due to unstrained TH17 response.52 Recent meta-analysis of GWAS in MS listed STAT3 as one of the genes with a suggestive role in at least two autoimmune disorders15 but failed to replicate the initial STAT3 association. The failure to replicate the initial association was probably due to selecting the most significantly associated regional SNP (rs2293152), which resides just outside of the rs744166 containing LD region and has only limited LD with the rs744166 (r2 0.35 in HapMap2 CEU population), for the replication analysis (Figure 2). These observations support a wider role for STAT3 in autoimmunity and adds this gene to the growing list of MS-susceptibility genes with validated or substantial evidence for association in at least two inflammatory diseases.48–50 All of these together suggest a significant role of this locus in immune system and autoimmune disease pathogenesis.
Most of the currently validated (IL2RA, IL7R, CD58, CLEC16A, IRF8, TNFRSF1A, TYK2)7,9,12–17,53 and suggested (C7 [MIM 217070], CD6 [MIM 186720], IL12A [MIM 161560], OLIG3 [MIM 609323]–TNFAIP3 [MIM 191163], PTGER4 [MIM 601586], RGS1 [MIM 600323])15,30 non-HLA MS susceptibility loci have known functions in the immune system and particularly in T cells. Although their independent ORs are modest, their combined effect might be larger, and a large-scale international study would be required to estimate their combined effect toward disease predisposition. The present study demonstrates the power of the founder population study design to complement large-scale GWAS in identifying genes and pathways of general significance, not only rare high-impact alleles.
Acknowledgments
We wish to thank all participating MS patients and families. Elli Kempas, Liisa Arala, Anne Vikman, Anne Nyberg, Minna Suvela, and Marja-Leena Sairanen are acknowledged for their invaluable assistance and technical support. We sincerely thank Ida Surakka, Samuli Ripatti, and Carl Anderson for their valuable assistance and advice in statistical analyses and Nicole Soranzo for guidance on Haplotter. The International Multiple Sclerosis Genetics Consortium (IMSGC) and Gene MSA Consortium are acknowledged for the data utilized in the replication phase. The Danish Multiple Sclerosis Society is acknowledged for supporting the Danish sample collection, and the Norwegian Bone Marrow Donor Registry is acknowledged for collaboration in establishment of the Norwegian control material. The Institute for Molecular Medicine Finland (FIMM) Technology Center (previously Finnish Genome Center) and the Broad Institute Center for Genotyping and Analysis are acknowledged for conducting genotyping on the Illumina GWA platform. The Health 2000 project is thanked for providing population-based controls for this GWAS study. This work was supported by the National Institutes of Health (grant RO1 NS 43559), the Center of Excellence for Disease Genetics of the Academy of Finland (grants 213506 and 129680), the Sigrid Juselius Foundation, the Biocentrum Helsinki Foundation, Helsinki University Central Hospital Research Foundation, the Neuropromise EU project (grant LSHM-CT-2005-018637), and The Wellcome Trust (grant 089061/Z/09/Z). The Broad Institute Center for Genotyping and Analysis is supported by the National Center for Research Resources (grant U54 RR020278). The genotyping of the Health 2000 controls was funded by the SGENE EU project (LSHM-CT-2006-037761) and Simons Foundation (R01MH71425-01A1). P.L.D.J. is a Harry Weaver Neuroscience Scholar of the National MS Society. L.P. is a member of the Board of the Orion Pharma Limited, Helsinki, Finland.
Web Resources
The URLs for data presented herein are as follows:
PLINK: Whole Genome Association Analysis Toolset, http://pngu.mgh.harvard.edu/~purcell/plink/
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/
Ingenuity Pathway Analysis (IPA) Software, http://www.ingenuity.com
Database of Genomic Variants (DGV), http://projects.tcag.ca/variation/
International HapMap Project, http://www.hapmap.org/
The Human Genome Diversity Project (HGDP) Selection Browser, http://hgdp.uchicago.edu/
References
Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
Full text links
Read article at publisher's site: https://doi.org/10.1016/j.ajhg.2010.01.017
Read article for free, from open access legal sources, via Unpaywall: http://www.cell.com/article/S0002929710000212/pdf
Citations & impact
Impact metrics
Citations of article over time
Article citations
Synergistic effects of BTN3A1, SHP2, CD274, and STAT3 gene polymorphisms on the risk of systemic lupus erythematosus: a multifactorial dimensional reduction analysis.
Clin Rheumatol, 43(1):489-499, 09 Sep 2023
Cited by: 1 article | PMID: 37688767
SOCS7-Derived BC-Box Motif Peptide Mediated Cholinergic Differentiation of Human Adipose-Derived Mesenchymal Stem Cells.
Int J Mol Sci, 24(3):2786, 01 Feb 2023
Cited by: 1 article | PMID: 36769102 | PMCID: PMC9917589
High prevalence of low-allele-fraction somatic mutations in STAT3 in peripheral blood CD8+ cells in multiple sclerosis patients and controls.
PLoS One, 17(11):e0278245, 28 Nov 2022
Cited by: 3 articles | PMID: 36441748 | PMCID: PMC9704626
STAT3 gain-of-function is not responsible for low total IgE levels in patients with autoimmune chronic spontaneous urticaria.
Front Immunol, 13:902652, 19 Jul 2022
Cited by: 1 article | PMID: 35928809 | PMCID: PMC9345496
Roles of Fatty Acids in Microglial Polarization: Evidence from In Vitro and In Vivo Studies on Neurodegenerative Diseases.
Int J Mol Sci, 23(13):7300, 30 Jun 2022
Cited by: 5 articles | PMID: 35806302 | PMCID: PMC9266841
Review Free full text in Europe PMC
Go to all (136) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Diseases (Showing 23 of 23)
- (1 citation) OMIM - 153420
- (1 citation) OMIM - 102582
- (1 citation) OMIM - 147060
- (1 citation) OMIM - 191190
- (1 citation) OMIM - 126200
- (1 citation) OMIM - 146661
- (1 citation) OMIM - 186720
- (1 citation) OMIM - 161560
- (1 citation) OMIM - 611303
- (1 citation) OMIM - 128240
- (1 citation) OMIM - 217070
- (1 citation) OMIM - 603590
- (1 citation) OMIM - 609323
- (1 citation) OMIM - 191163
- (1 citation) OMIM - 147730
- (1 citation) OMIM - 603583
- (1 citation) OMIM - 605533
- (1 citation) OMIM - 142857
- (1 citation) OMIM - 176941
- (1 citation) OMIM - 601565
- (1 citation) OMIM - 601586
- (1 citation) OMIM - 600543
- (1 citation) OMIM - 600323
Show less
SNPs (Showing 6 of 6)
- (7 citations) dbSNP - rs744166
- (2 citations) dbSNP - rs2293152
- (2 citations) dbSNP - rs6503695
- (2 citations) dbSNP - rs957970
- (1 citation) dbSNP - rs3135338
- (1 citation) dbSNP - rs1364194
Show less
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
STAT3 locus in inflammatory bowel disease and multiple sclerosis susceptibility.
Genes Immun, 11(3):264-268, 04 Mar 2010
Cited by: 40 articles | PMID: 20200543
Use of a genetic isolate to identify rare disease variants: C7 on 5p associated with MS.
Hum Mol Genet, 18(9):1670-1683, 16 Feb 2009
Cited by: 16 articles | PMID: 19221116 | PMCID: PMC2667286
Independent replication of STAT3 association with multiple sclerosis risk in a large German case-control sample.
Neurogenetics, 13(1):83-86, 18 Nov 2011
Cited by: 16 articles | PMID: 22095036
[The genetic profile of multiple sclerosis: risk genes and the "dark matter"].
Nervenarzt, 83(6):705-713, 01 Jun 2012
Cited by: 2 articles | PMID: 22430841
Review
Funding
Funders who supported this work.
NCRR NIH HHS (1)
Grant ID: U54 RR020278
NIMH NIH HHS (1)
Grant ID: R01 MH071425
NINDS NIH HHS (2)
Grant ID: R01 NS043559
Grant ID: R01 NS 43559
Wellcome Trust (1)
Grant ID: 089061/Z/09/Z