Functional mapping and annotation of genetic associations with FUMA.

Watanabe K; Taskesen E; van Bochoven A; Posthuma D

doi:10.1038/s41467-017-01261-5

Functional mapping and annotation of genetic associations with FUMA.

Affiliations

1. Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, 1081 HV, The Netherlands.
Authors
Watanabe K¹
Taskesen E¹
Posthuma D¹
(3 authors)
2. Faculty of Science, VU University Amsterdam, Amsterdam, 1081 HV, The Netherlands.
Authors
van Bochoven A²
(1 author)

ORCIDs linked to this article

Nature Communications, 28 Nov 2017, 8(1):1826
https://doi.org/10.1038/s41467-017-01261-5 PMID: 29184056 PMCID: PMC5705698

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

This article is based on a previously available preprint.

Abstract

A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.

Free full text

Nat Commun. 2017; 8: 1826.

Published online 2017 Nov 28. https://doi.org/10.1038/s41467-017-01261-5

PMCID: PMC5705698

PMID: 29184056

Functional mapping and annotation of genetic associations with FUMA

Kyoko Watanabe,¹ Erdogan Taskesen,^1,² Arjen van Bochoven,³ and Danielle Posthuma^1,⁴

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Associated Data

Supplementary Materials: Supplementary Information
41467_2017_1261_MOESM1_ESM.docx (2.4M)
Peer review file
41467_2017_1261_MOESM2_ESM.pdf (453K)
Description of additional supplementary files
41467_2017_1261_MOESM3_ESM.pdf (199K)
Supplementary Data 1
41467_2017_1261_MOESM4_ESM.xlsx (58K)
Supplementary Data 2
41467_2017_1261_MOESM5_ESM.xlsx (51K)
Supplementary Data 3
41467_2017_1261_MOESM6_ESM.xlsx (64K)
Supplementary Data 4
41467_2017_1261_MOESM7_ESM.xlsx (156K)
Supplementary Data 5
41467_2017_1261_MOESM8_ESM.xlsx (103K)
Supplementary Data 6
41467_2017_1261_MOESM9_ESM.xlsx (35K)
Supplementary Data 7
41467_2017_1261_MOESM10_ESM.xlsx (42K)
Supplementary Data 8
41467_2017_1261_MOESM11_ESM.xlsx (56K)
Supplementary Data 9
41467_2017_1261_MOESM12_ESM.xlsx (49K)
Supplementary Data 10
41467_2017_1261_MOESM13_ESM.xlsx (66K)
Supplementary Data 11
41467_2017_1261_MOESM14_ESM.xlsx (123K)
Supplementary Data 12
41467_2017_1261_MOESM15_ESM.xlsx (109K)
Supplementary Data 13
41467_2017_1261_MOESM16_ESM.xlsx (83K)
Supplementary Data 14
41467_2017_1261_MOESM17_ESM.xlsx (46K)
Supplementary Data 15
41467_2017_1261_MOESM18_ESM.xlsx (68K)
Supplementary Data 16
41467_2017_1261_MOESM19_ESM.xlsx (56K)
Supplementary Data 17
41467_2017_1261_MOESM20_ESM.xlsx (67K)
Supplementary Data 18
41467_2017_1261_MOESM21_ESM.xlsx (129K)
Supplementary Data 19
41467_2017_1261_MOESM22_ESM.xlsx (50K)
Supplementary Data 20
41467_2017_1261_MOESM23_ESM.xlsx (45K)

Data Availability Statement: Data and tools used in FUMA are all publicly available from the following links (details are in Supplementary Table ¹). dbSNP build 146 rsID archive: ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_grch137p13/database/organism_data/RsMergeArch.bcp.gz, 1000 genome phase 3 reference panel: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, CADD: http://cadd.gs.washington.edu/download, RegulomeDB: http://www.regulomedb.org/downloads, 15-core chromatin state: http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/, GWAS catalog: https://www.ebi.ac.uk/gwas/, GTEx v6: http://www.gtexportal.org/home/, Blood eQTL Browser: http://genenetwork.nl/bloodeqtlbrowser/, BIOS QTL Browser: http://genenetwork.nl/biosqtlbrowser/, BRAINEAC: http://www.braineac.org/, HiC (GSE87112): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87112, promoter/enhancer regions: http://egg2.wustl.edu/roadmap/data/byDataType/dnase/, pLI score: ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/functional_gene_constraint, ncRVIS score: http://journals.plos.org/plosgenetics/article/file?type=supplementary&id=info:doi/10.1371/journal.pgen.1005492.s011, MsigDB: http://software.broadinstitute.org/gsea/msigdb/, WikiPathways: http://wikipathways.org/index.php/WikiPathways, ANNOVAR: http://annovar.openbioinformatics.org/en/latest/, and MAGMA: https://ctg.cncr.nl/software/magma. GWAS summary statistics used in this study is available from the followings; BMI: http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files, CD: ftp.sanger.ac.uk/pub/consortia/ibdgenetics/, SCZ: http://www.med.unc.edu/pgc/results-and-downloads.

Abstract

Introduction

In the past decade, more than 2500 genome-wide association studies (GWAS) have identified thousands of genetic loci for hundreds of traits^¹. The past 3 years have seen an explosive increase in GWAS sample sizes^²–⁴, and these are expected to increase even further to 0.5–1 million in the next year and beyond^⁵. These well-powered GWAS will not only lead to more reliable results but also to an increase in the number of detected disease-associated genetic loci. To benefit from these results, it is crucial to translate genetic loci into actionable variants that can guide functional genomics experimentation and drug target testing^⁶. However, since the majority of GWAS hits are located in non-coding or intergenic regions^⁷, direct inference from significantly associated single-nucleotide polymorphisms (SNPs) rarely yields functional variants. More commonly, GWAS hits span a genomic region (“GWAS risk loci”) that is characterized by multiple correlated SNPs, and may cover multiple closely located genes. Some of these genes may be relevant to the disease, while others are not, yet due to the correlated nature of closely located genetic variants, distinguishing relevant from non-relevant genes is often not possible based on association P-values alone. Pinpointing the most likely relevant, causal genes and variants requires integrating available information about regional linkage disequilibrium (LD) patterns and functional consequences of correlated SNPs, such as deleteriousness of variants, but also their effects on gene expression as well as their role in chromatin interaction sites. Ideally, functional inferences obtained from different repositories are integrated, and annotated SNP effects are interpreted in the broader context of genes and molecular pathways. For example, consider a genomic risk locus with one lead SNP associated with an increased risk for a disease, and several dozen other SNPs in LD with the lead SNP that also show a low association P-value, spanning multiple genes. If none of these tested SNPs and none of the other (not tested but known) SNPs in LD with the lead SNP are known to have a functional consequence (i.e., altering expression of a gene, affecting a binding site or violating the protein structure), no causal gene can be indicated. However, if one or several of the SNPs are known to affect the function of one of the genes in the area, but not the other genes, then that single gene has a higher probability of being functionally related to the disease. Pinpointing which and how genes are affected by SNPs associated with a trait is crucial in increasing our insight into the biological mechanisms underlying that trait. Interpreting SNP-trait associations requires adding functional information from several resources and repositories such as, e.g., the Genotype-Tissue Expression (GTEx)^⁸, Encyclopedia of DNA Elements (ENCODE)^⁹, Roadmap Epigenomics Project^¹⁰, or chromatin interaction information^¹¹.

In practice, the extraction and interpretation of the relevant biological information from available repositories is not always straightforward, and can be time consuming as well as error prone. We have, therefore, developed FUMA, which functionally annotates GWAS findings and prioritizes the most likely causal SNPs and genes using information from 18 biological data repositories and tools. Gene prioritization is based on a combination of positional mapping, expression quantitative trait loci (eQTL) mapping and chromatin interaction mapping. Results are visualized to facilitate quick insight into the implicated molecular functions. FUMA is available as an online tool at http://fuma.ctglab.nl, where users can customize settings to for example only use exonic SNPs for annotation, or only use SNPs that are eQTLs in specific tissues for the annotation based on expression data. As input, FUMA requires GWAS summary statistics and outputs include multiple tables and figures containing extensive information on, e.g., functionality of SNPs in genomic risk loci, including protein-altering consequences, gene-expression influences, open-chromatin states as well as three-dimensional (3D) chromatin interactions. The online tool includes interactive figures that can be used to explore associations in more depth and aids, e.g., in identifying multiple lines of evidence pointing to the same prioritized gene, or in connecting hits in several genes via biological pathways.

Results

Overview of FUMA web application

FUMA incorporates 18 biological data repositories and tools to process GWAS summary statistics and provide a variety of annotations (Supplementary Table ¹). To accomplish this task, FUMA consists of two separate processes described in detail below.

The core function of FUMA is the SNP2GENE process (Fig.(Fig.1)1) in which SNPs are annotated with their biological functionality and mapped to genes based on positional, eQTL and chromatin interaction information of SNPs. First, based on the provided summary statistics (input format is available in Supplementary Note ¹), independent significant SNPs and their surrounding genomic loci are identified by FUMA depending on LD structure, and define lead SNPs and genomic risk loci (Methods). Independent significant SNPs and SNPs that are in LD with the independent significant SNPs are then annotated for functional consequences on gene functions (based on Ensembl genes (build 85) using ANNOVAR^¹²), deleteriousness score (CADD score^¹³), potential regulatory functions (RegulomeDB score^¹⁴ and 15-core chromatin state predicted by ChromHMM^¹⁵ for 127 tissue/cell types^⁹,¹⁰), effects on gene expression using eQTLs of various tissue types and 3D structure of chromatin interactions with Hi-C data (Methods). In addition, independent significant SNPs and correlated SNPs are also linked to the GWAS catalog^¹ to provide insight into previously reported associations of the SNPs in the risk loci with a variety of phenotypes.

Fig. 1

Overview of FUMA. FUMA includes two core processes, SNP2GENE and GENE2FUNC. The input is GWAS summary statistics. SNP2GENE prioritizes functional SNPs and genes, outputs tables (blue boxes), and creates Manhattan, quantile–quantile (QQ) and interactive regional plots (box at right bottom). GENE2FUNC provides four outputs; a gene expression heatmap, enrichment of differentially expressed gene (DEG) sets in a certain tissue compared to all other tissue types, overrepresentation of gene sets, and links to external biological information of input genes. All results are downloadable as text files or high-resolution images

Functionally annotated SNPs are subsequently mapped to genes based on functional consequences on genes by (i) physical position on the genome (positional mapping), (ii) eQTL associations (eQTL mapping), and (iii) 3D chromatin interactions (chromatin interaction mapping). Gene mapping can be controlled by setting several parameters (Supplementary Table ²) that allow to in- or exclude specific functional categories of SNPs (Supplementary Fig. ¹). Positional mapping is used to map SNPs based on being physically located inside a gene using a default of 10kb windows, yet custom windows around a gene can be set by the user. Users can select to only use SNPs that have specified functional consequences, such as coding or splicing SNPs, to limit the positional mapping to functionally relevant SNPs. Thus, by selecting to exclude intronic SNPs from the positional mapping function, genes that contain only intronic SNPs in LD of independent significant SNPs will not be prioritized by FUMA. eQTL mapping is used to map SNPs to genes which they show a significant eQTL association with (i.e., the expression of that gene is associated with allelic variation at the SNP). eQTL mapping uses information from 4 data repositories (GTEx^⁸, Blood eQTL browser^¹⁶, BIOS QTL browser^¹⁷ and BRAINEAC^¹⁸), and is currently based on cis-eQTLs which can map SNPs to genes up to 1Mb apart. Users can select tissue/cell types that are relevant to the phenotype of interest, and eQTLs can be filtered either by nominal P-value or FDR provided by the original data sources (Methods and Supplementary Note ²). Chromatin interaction mapping is used to map SNPs to genes when there is a significant chromatin interaction between the disease-associated regions and nearby or distant genes. Chromatin interaction mapping can involve long-range interactions as it does not have a distance boundary as in eQTL mapping. FUMA currently contains Hi-C data of 14 tissue types and seven cell lines from the study of Schmitt et al.^¹¹, yet new chromatin interaction data will be added when it becomes available and FUMA also allows users to upload their own chromatin interaction matrices, which is not limited to Hi-C, but also accommodates ChIA-PET, 5C or Capture Hi-C data (Methods and Supplementary Note ³). Since chromatin interactions are often defined in a certain resolution (as a genomic region), such as 40kb, an interacting region may span multiple genes. To further prioritize candidate genes from chromatin interaction mapping, information on tissue/cell type specific enhancer and promoter regions from the Roadmap Epigenomics Project^¹⁰ can be optionally integrated with interacting regions to filters SNPs and target genes (see Methods for details).

For each of these three mapping strategies, additional filtering of SNPs based on functional annotations (i.e., CADD, RegulomeDB, and 15-core chromatin state) is optionally available (Methods and Supplementary Table ²). For example, setting a CADD score threshold will cause FUMA to use only highly deleterious SNPs or filtering SNPs by RegulomeDB score or open chromatin state prioritizes SNPs which are likely to affect regulatory elements per one of the mapping strategies.

The three mapping strategies (positional, eQTL and chromatin interaction mapping) result in a set of prioritized genes, based on the GWAS input and specific user-defined filter settings. Both eQTL and chromatin interaction mapping may lead to prioritized genes that are not necessarily themselves located inside a genomic risk locus, although they are linked to SNPs within a genomic risk locus. The combination of positional mapping of deleterious coding SNPs, eQTL mapping, and chromatin interaction mapping across (relevant) tissue types may reveal multiple lines of evidence pointing towards the same genes and enables to prioritize genes that are highly likely involved in the trait of interest.

To obtain insight into putative biological mechanisms of prioritized genes, the GENE2FUNC process annotates these genes in biological context (Fig. 1; see Methods for details). Specifically, biological information for each input gene is provided to gain insight into previously associated diseases as well as drug targets by mapping OMIM^¹⁹ ID and DrugBank^²⁰ ID. Tissue specific expression patterns based on GTEx v6 RNA-seq data^⁸ for each gene are visualized as an interactive heatmap. Beside the single gene level analyses, overrepresentation in sets of differentially expressed genes (DEG; sets of genes which are more (or less) expressed in a specific tissue compared to other tissue types) for each of 53 tissue types based on GTEx v6 RNA-seq data^⁸ is also provided to identify tissue specificity of prioritized genes (Methods; Supplementary Table ³). Enrichment of prioritized genes in biological pathways and functional categories is tested using the hypergeometric test against gene sets obtained from MsigDB^²¹ and WikiPathways^²². The proportions of overlapping genes, enrichment P-value and which input genes are overlapping with the tested gene sets are visualized in plots as well as tables, which provides quick overview of the shared biological functions of prioritized genes.

The results of SNP2GENE and GENE2FUNC processes are displayed as either interactive tables or plots on the web application. Additionally, tables are downloadable as plain text files (Supplementary Note ¹) and plots are downloadable as high-quality images in several formats (PNG, JPEG, PDF, and SVG).

FUMA covers various features of existing tools

As a variety of bioinformatics tools have been developed to obtain insights in GWAS results^²³–²⁵, we compared the list of features available in FUMA with the features available in other tools, and describe these further below (Table 1).

Table 1

Feature comparison of bioinformatics tools and data sources

Tools	Format	GWAS summary statistics	LD	Functional consequences on genes	Regulatory elements	eQTLs	3D chromatin interactions	Prioritize SNPs	Map SNPs to genes	Gene expression	Pathways and gene sets	Prioritize genes	Visualization
LD calculation
PLINK	St	x	x
Variant annotations
ANNOVAR	St			x	x			x	x
VEP	St			x	x			x	x
SCAN	Web		x			x		x		x
ReglomeDB	Web				x	x		x					x
HaploReg	Web		x		x	x		x
Gene-based test/Gene-set analyses
VEGAS	St	x							x			x
MAGMA	St	x							x		x	x
Pascal	St	x							x		x	x
MAGENTA	St	x							x		x	x
INRICH	St	x							x		x
DEPICT	St	x							x		x	x
Visualization tools
LocusZoom	St/Web	x											x
LocusTrack	St/Web	x			x								x
3D genome browser	Web						x						x
FUMA
	Web	x	x	x	x	x	x	x	x	x	x	x	x

St Standalone software, Web Web-based application

LD calculation is the first step to characterize risk loci of GWAS by computing population specific LD structure, so called clumping which identifies independent significant SNPs and defines the genomic risk loci. PLINK^²⁶ is the most widely used software for this task which takes GWAS summary statistics (requiring a reference panel) or genotype data as input. In FUMA, this task is automated by using pairwise LD (r ²) of SNPs in the reference panel (1000 genomes project phase 3^²⁷) pre-computed by PLINK, resulting in a list of independent significant SNPs, lead SNPs and genomic risk loci based on the GWAS input file. FUMA also adds SNPs to the identified risk loci that do not have a P-value (i.e., they were not available in the GWAS input file), but that are LD proxies of the identified lead SNPs, as these SNPs might be causally relevant. Alternatively, users can pre-compute lead SNPs or risk loci and upload these to FUMA.

Variant Annotation is required to obtain information on biological consequences of SNPs in the risk loci. There are several tools such as ANNOVAR^¹² and VEP^²⁸ which annotate functional consequences on genes, and variant scores such as deleteriousness and phylogenetic conservations (extensive review is available in Hou and Zhang^²⁹). Particularly for non-coding SNPs, SCAN^³⁰, RegulomeDB^¹⁴ and HaploReg^³¹ annotate regulatory information, such as eQTLs, enhancer/promoter regions, and transcription factor binding sites (see Tak and Farnham^³² for extensive overview). Although SCAN and HaploReg correct for LD, the input of the tools mentioned above is a list of SNPs of interest which does not take genetic associations into account and thus requires pre-processing of GWAS results by the user. FUMA performs annotation of SNPs that are in LD of independent significant SNPs in a single flow, and does not require additional data preformatting.

Gene-based test/gene-set analyses are methods that enable to summarize SNP associations at the gene level and associate the set of genes to biological pathways. For instance, VEGAS performs permutation based simulation^³³,³⁴, MAGMA employs multiple linear regression^³⁵ and Pascal computes sum and maximum of chi-squared statistics^³⁶ to obtain gene-based P-values. Additionally, there are several tools that perform not only gene-based test but also gene-set analyses using full distribution of genetic associations (e.g., MAGMA^³⁵, MAGENTA^³⁷, INRICH^³⁸, and DEPICT^³⁹). FUMA implements MAGMA gene-based analysis and gene-set analysis on the full GWAS input data. In addition, genes prioritized by SNP2GENE or by the user are also tested for overrepresentation in various gene sets in GENE2FUNC process.

Visualization is one of the essential features that allows (quick) insights into the GWAS results, e.g., summarizing annotated information of SNPs and genes. LocusZoom is one of the most widely used visualization tool for GWAS results which plots LD structure of a risk locus, gene locations as well as SNP association values^⁴⁰. LocusTrack is an extension of LocusZoom which also plots additional information together such as Chip-seq and chromatin state^⁴¹. 3D Genome Browser is a recently developed web application which contains comprehensive 3D chromatin interaction datasets such as Hi-C and ChIA-PET^⁴², though it does not integrate with GWAS summary statistics. These tools are primarily focused on visualization of a subset of functionally relevant data sources. FUMA integrates results from multiple lines of evidence and provides interactive visualization of results, facilitating rapid interpretation.

The current lack of a single platform that integrates all possible resources for post-GWAS annotation hampers our understanding of GWAS results, as different GWAS studies may use a different selection of queried resources rendering their post-GWAS interpretation incomplete and difficult to compare. FUMA provides a central place for a wide variety of post-GWAS annotation strategies and to our knowledge is the most versatile tool in doing so.

Application to GWAS of body mass index

To validate the utility of FUMA, we applied it to summary statistics of the most recent GWAS for body mass index (BMI; 236,231 individuals)^⁴³. FUMA identified 95 lead SNPs (from 223 independent significant SNPs) across 77 genomic risk loci (Fig. 2 and Supplementary Data ¹–³), in accordance with the original study. We first conducted positional mapping of deleterious coding SNPs and eQTL mapping (Methods) which prioritized 151 unique genes; 23 genes with deleterious coding SNPs (positional mapping), and 144 genes with eQTLs that potentially alter expression of these genes (eQTL mapping) including 16 genes that had both deleterious coding SNPs and eQTLs (Supplementary Data ⁴). The 151 genes consist of 55 genes that were also reported in the original study^⁴³ and 96 novel genes implicated by FUMA, including 45 genes which are located outside the risk loci. These novel candidates have shared biological functions with the 55 previously known candidate genes such as “metabolism of carbohydrate”, “metabolism of lipid and lipoprotein”, “immune system”, and “calcium signaling” (Supplementary Data ⁵). In addition, FUMA results showed that, although several genomic loci for BMI included multiple prioritized genes, a single gene was prioritized in 22 out of 43 loci which contain at least one prioritized gene (Supplementary Fig. ²), suggesting that these 22 genes have a high probability of being the causal gene in that region. The 22 “highly likely causal genes” include several well-known genes for BMI such as NEGR1, TOMM40, and TMEM18. The strongest GWAS association signal for BMI was on 16q.12.2 where three genes were prioritized; FTO, RBL2, and IRX3 (Fig. 3). These three genes were only prioritized by eQTL mapping as the positional mapping showed no deleterious coding SNPs located in these genes. The original study^⁴³ only mentioned FTO, because the associated SNPs were located in this gene, however none of the associated SNPs have a potential direct affect such as coding SNPs on FTO. Two of the genes prioritized by FUMA (RBL2 and IRX3) are physically located outside the genomic locus and are missed when using conventional approaches that prioritize genes located in the locus of interest based on LD around the top SNP. Although the IRX3 gene was not reported in the original study^⁴³, recent functional work has indeed validated this as the causal gene whose expression is affected by SNPs in the 16q.12.2 locus^⁴⁴.

Fig. 2

Overview of prioritized genes from BMI GWAS by FUMA. Starting from the BMI GWAS summary statistics, boxes represent results of the SNP2GENE process. The annotated SNPs include all independent lead SNPs and SNPs which are in LD with these lead SNPs. Prioritized genes are divided into three categories; genes that are implicated by deleterious coding SNPs (colored pink), by eQTLs for these genes (colored blue), or by chromatin interactions (colored green). The prioritized genes are further categorized into previously reported genes (blue) and novel genes (red) prioritized genes by FUMA. ^*These genes were not prioritized by FUMA since they do not have either deleterious coding SNPs, eQTLs or chromatin interactions, although they are located within GWAS risk loci

Fig. 3

Regional plot of the locus 16q.12.2 of BMI GWAS. a Extended region of the FTO locus, which includes prioritized genes RBL2 and IRX3. Genes prioritized by FUMA are highlighted in red. b Zoomed in regional plot of FTO locus with, from the top, GWAS P-value (SNPs are colored based on r ²), CADD score, RequlomeDB score and eQTL P-value. Non-GWAS-tagged SNPs are shown at the top of the plot as rectangles since they do not have a P-value from the GWAS, but they are in LD with the lead SNP. eQTLs are plotted per gene and colored based on tissue types. In the plots of CADD score, RegulomeDB score and eQTLs, SNPs which are not mapped to any gene are colored gray

We then performed chromatin interaction mapping using Hi-C data of 14 tissue types (Methods). FUMA prioritized 310 genes (Supplementary Data ⁴), of which 61 genes are overlapping with the genes prioritized by positional and/or eQTL mappings and 232 genes are located outside of the genomic risk loci (Fig. 2). That resulted in a total of 400 prioritized genes by combining three mapping strategies including 330 novel candidates which were not reported in the original study (Table 2 and Supplementary Data ⁴). These novel candidates further supported shared biological functions with previously reported known genes, such as lipid and lipoprotein metabolism, homeostatic process and various metabolic pathways, with a greater number of genes compared to the mappings without Hi-C data (Supplementary Data ⁵). Out of 400 prioritized genes, 59 genes are mapped by both eQTLs and chromatin interactions including IRX3 on the 16q.12.2 locus (Fig. 4), which further supports the hypothesis that these genes are involved in the risk of BMI. Of the 48 loci that contained at least one prioritized gene from positional and eQTL mappings, chromatin interaction mapping identified candidate genes in additional 18 loci (Supplementary Fig. ²), including loci mapped to known genes associated with BMI such as MC4R, FOXO3, and ADCY9. The 400 prioritized genes showed enrichment in 9 GO terms, such as “response to zinc ion” and “oligopeptide binding” overlapping with multiple metallothionein and glutathione S-Transferase genes whose association with obesity risk has been reported^{⁴⁵,⁴⁶} (Supplementary Data ⁶).

Table 2

Summary of FUMA application to three GWAS summary statistics

GWAS	Risk loci	Reported genes in the original study	Positional mapping	eQTL mapping	Chromatin interaction mapping	Total^*	Genes located outside the risk loci	Novel candidates	Loci contain prioritized genes
BMI	77	117	23	144	310	400	263	263	67
CD	71	115	39	69	199	276	161	215	55
SCZ	109	349	36	54	33	113	26	35	45

*The number of unique genes mapped by one of the positional, eQTL and chromatin interaction mappings

Fig. 4

Chromatin interactions and eQTLs of BMI risk loci on chr. 16. The most outer layer is the Manhattan plot displaying SNPs with P-value<0.05. Candidate SNPs are colored based on the highest r ² to one of the independent significant loci (red: r ²>0.8, orange: r ²>0.6). Other SNPs are colored in gray. rsID of top SNPs per locus are labeled. The outer circle is the chromosome coordinate and genomic risk loci are highlighted in blue. Genes mapped by either Hi-C or eQTLs are shown on the inner circle. Genes mapped by Hi-C, eQTLs are colored orange and green, respectively. Genes mapped by both are colored red. Chromatin interaction and eQTLs are shown as links colored orange and green respectively

Thus, using BMI summary statistics, FUMA confirmed known genes but also prioritized novel genes, including potential causal genes located outside the GWAS risk loci of BMI, which were missed in the original study.

Application to Crohn’s disease GWAS

To further illustrate its utility, we applied FUMA to the summary statistics of Crohn’s disease^⁴⁷ (CD; 6333 cases and 15,056 controls). With FUMA, 95 lead SNPs from 184 independent significant SNPs across 71 genomic loci were identified for CD (Supplementary Fig. ³ and Supplementary Data ⁷–¹⁰). First, describing the results of positional mapping of deleterious coding SNPs and eQTL mapping, FUMA prioritized 95 unique genes from 32 loci (Supplementary Fig. ⁴), of which 39 genes were implicated by deleterious coding SNPs and 69 were implicated by eQTLs influencing expression of these genes (12 genes had both deleterious coding SNPs and eQTLs; Table 2 and Supplementary Data ¹¹). The prioritized 95 genes include 37 known candidate genes that were also reported in the original study^⁴⁷ including well-known CD-related genes such as NOD2, IL23R, and SLC22A5, while 58 genes were novel (Supplementary Fig. ³; see Supplementary Note ⁴ and Supplementary Figs. ⁵–⁷ for detail results). These novel candidates include 18 genes that are physically located outside the GWAS risk loci, and the novel candidates mainly share immune system related biological functions with 37 previously known genes (Supplementary Data ¹²).

Chromatin interaction mapping using Hi-C data in small bowel and liver prioritized 199 genes of which 18 genes are overlapping with genes prioritized by positional and/or eQTL mappings and 149 genes are located outside of the genomic risk loci (Supplementary Data ¹¹). That resulted in a total of 276 prioritized genes including 215 novel candidates which were not reported in the original study (Table 2 and Supplementary Fig. ³). Of the 23 loci which are mapped to at least one gene by positional and eQTL mappings, additional 23 loci are mapped to candidate genes by chromatin interaction mapping, in which several of the genes prioritized from those loci are involved in immune system and cytokine signaling pathways (Supplementary Fig. ⁴ and Supplemental Data ¹²). One of these 23 risk loci, the 17q12 locus is mapped to six chemokine ligands by Hi-C in liver: CCL1, CCL2, CCL7, CCL8, CCL11, and CCL13. Additionally, prioritized genes include 11 cytokines (IL4, IL5, IL10, IL19, IL23R, IL24, IL27, IL33, IL1RL1, IL18R1, and IL18RAP) wherein IL18R1 and IL18RAP are also mapped by eQTLs in whole blood and IL23R and IL27 are also mapped by deleterious coding SNPs which further supports the involvement of these cytokine genes in CD. The role of these chemokines and cytokines in inflammatory disease has been widely studied^⁴⁸ and yet, chromatin interaction mapping identified additional relevant candidates from the risk loci. The prioritized 276 genes showed enrichment in 123 canonical pathways such as immune system and cytokine related pathways which are known to be highly relevant to CD^⁴⁹ (Supplementary Data ¹³).

Application to schizophrenia GWAS

We also applied FUMA to the most recent Schizophrenia (SCZ; 36,989 cases and 113,075 controls) GWAS summary statistics^³, and 128 lead SNPs from 269 independent significant SNPs across 109 genomic loci were identified (Supplementary Note ⁵, Supplementary Fig. ⁸ and Supplementary Data ¹⁴–¹⁷). Positional mapping of deleterious coding SNPs and eQTL mapping prioritized 84 unique genes of which 36 genes were implicated by deleterious coding SNPs and 65 were implicated by eQTLs influencing expression of these genes (six genes had both deleterious coding SNPs and eQTLs; Supplementary Data ¹⁸). The prioritized 84 genes include 65 genes which were previously reported as candidates in the original study^³, while 19 genes were novel (Table 2) including 11 genes which are physically located outside the GWAS risk loci. These 19 novel candidates have several shared biological functions with 65 previously known genes, such as “matrisome” and “neuronal system” (Supplementary Data ¹⁹). Out of 84 prioritized genes, 60 of them were also identified by the recent TWAS^⁵⁰ and Hi-C^⁵¹ studies including 10 genes which are physically located outside the risk loci. The prioritized genes cover 34 genomic loci out of 109 of which 20 loci are mapped to single prioritized gene (Supplementary Fig. ⁹; see Supplementary Note ⁵ and Supplementary Fig. ¹⁰ for detailed results). These 20 genes are highly likely to drive the association signal in the genomic loci. These genes include CACNA1C, LRP1, PLCB2, GRIN2A, and NMUR2, which are involved in pathways such as Alzheimer’s disease, long-term potentiation, calcium signaling, and transmission across chemical synapses.

Chromatin interaction mapping using Hi-C data in hippocampus and prefrontal cortex prioritized 33 genes of which DPYD and WBPIL are also mapped by a deleterious coding SNP, and VPS45 and PITPNM2 are also mapped by eQTL in the brain (Supplementary Data ¹⁸). Out of these 33 genes, 15 are located outside of the genomic risk loci. Together with positional and eQTL mapping, this resulted in a total of 113 candidate genes including 35 novel candidates which are not reported in the original study (Table 2 and Supplementary Fig. ⁸). The 29 genes prioritized only by chromatin interactions have shared functions with other genes such as “regulation of response to stress” (RWDD3), “intracellular signal transaction” (SGSM3), and several functions involved in regulation of transcriptions (OTUD7B and ZBTB18; Supplementary Data ¹⁹).

Enrichment was seen in several brain-system related pathways, such as nicotinic acetylcholine receptors (nAChR), long-term potentiation and neurotransmitter receptor binding (Supplementary Data ²⁰). nAChR is an important neuron receptor in which one of the subunits alpha-7 (CHRNA7) has been recently studied as a new Schizophrenia drug target^⁵²,⁵³. nAChR was also identified as enriched pathway in the recent study using Hi-C in human cerebral cortex^⁵¹ that suggests potential involvement of nAChR pathway in SCZ risk.

Discussion

We introduce a web application named FUMA that allows to process GWAS summary statistics, and annotate, prioritize SNPs and genes and facilitates interpretation by providing interactive visualizations. FUMA provides a single platform that is built on the most popular tools for post-GWAS annotation and includes a rich collection of data repositories to bring insights into the phenotype of interest, and annotation in FUMA typically takes only ±30min. For every prioritized gene, FUMA provides the rationale for pinpointing this gene, such as for example when the expression of the prioritized gene is altered by a SNP that is associated with the disease of interest. Interactive regional plots (Fig. 3 and Supplementary Figs. ⁵–⁷, ¹⁰) show which genes in a genomic risk locus are prioritized and which genes are not, and the annotated SNPs in the prioritized genes facilitate the generation of hypotheses for functional validation experiments. For example, if a gene is prioritized because of an associated loss-of-function SNP, follow-up validation experiments focusing on a knock-out of this gene may provide disease relevant functional information. On the other hand, if a gene is prioritized because a risk associated allele of a SNP increases expression of this gene in brain, then an overexpression experiment of this gene in neuronal cell cultures would be a more relevant experiment.

The availability of biological resources that can aid in the interpretation of GWAS results, such as Hi-C and ChIA-PET, have dramatically increased recently and several studies have identified novel candidates from GWAS risk loci by integrating their results for example with chromatin interactions^{⁵¹,⁵⁴–⁵⁷}. These technologies have the potential to identify distal interactions of promoters and enhancers. Especially for risk loci for which it has been difficult to identify target genes due to the presence of gene desserts, distal interactions might point to causal gene. Indeed, we identified additional putative causal genes by performing chromatin interaction mapping on outcomes from three GWAS studies (BMI, CD, and SCZ) and the additionally identified genes based on chromatin interaction information were mostly located outside of the risk loci, and were shown to have shared function with known candidates. Although chromatin interactions are highly tissue/cell type specific, as well as time dependent, and currently available data is still limited in those aspects, FUMA provides an option to upload custom interaction matrices. Additionally, FUMA is built in such a way that newly published data including 3D chromatin interactions, eQTLs and other variant annotations can easily be included in the SNP2GENE process. This makes FUMA a flexible web tool which can be utilized not only for new GWAS results but also for previously published GWAS to re-annotate risk loci with the latest biological data sources.

In summary, FUMA provides an easy-to-use tool to functionally annotate, visualize, and interpret results from genetic association studies and to quickly gain insight into the directional biological implications of significant genetic associations. FUMA combines information of state-of-the-art biological data sources in a single platform to facilitate the generation of hypotheses for functional follow-up analysis aimed at proving causal relations between genetic variants and diseases.

Methods

Data pre-processing

All genetic data sets used in this study are based on the hg19 human assembly and rsIDs were mapped to dbSNP build 146 if necessary. To compute minor allele frequencies and LD structure, we used the data from the 1000 Genomes Project^²⁷ phase 3 (1000G). Minor allele frequency and r ² of pairwise SNPs (minimum r ²=0.05 and maximum distance between a pair of SNPs is 1Mb) were pre-computed using PLINK^²⁶ for each of available populations (AFR, AMR, EAS, EUR, and SAS). Functional annotations of SNPs were obtained from the following three repositories; CADD^¹³, RegulomeDB^¹⁴, and core 15-state model of chromatin^{⁹,¹⁰,¹⁵}. Cis-eQTL information was obtained from the following four different data repositories; GTEx portal v6^⁸, Blood eQTL browser^¹⁶, BIOS QTL Browser^¹⁷, and BRAINEAC^¹⁸, and genes were mapped to ensemble gene ID if necessary (Supplementary Note ²). Pre-processed Hi-C data for 14 tissue types and seven cell lines were obtained from GSE87112^¹¹ (Supplementary Note ³). Predicted enhancer and promoter regions for 111 epigenomes were obtained from the Roadmap Epigenomics Projects^¹⁰. Genomic coordinate of GWAS catalog^¹ reported SNPs was lifted down using liftOver software from hg38 to hg19. Normalized gene expression data (RPKM, Read Per Kilobase per Million) from GTEx portal v6^⁸ for 53 tissue types were processed for different purposes. The details are described in “GTEx Gene Expression Data Set” section. Curated pathways and gene sets from MsigDB v5.2^²¹ and WikiPathways^²² which are assigned entrez ID.

Characterization of genomic risk loci based on GWAS

To define genomic loci of interest to the trait based on provided GWAS summary statistics, pre-calculated LD structure based on 1000G of the relevant reference population (EUR for BMI, CD and SCZ) is used. First of all, independent significant SNPs with a genome-wide significant P-value (<5e-8) and independent from each other at r ²<0.6 are identified. For each independent significant SNP, all known (i.e., regardless of being available in the GWAS input) SNPs that have r ²≥0.6 with one of the independent significant SNPs are included for further annotation (candidate SNPs). These SNPs may thus include SNPs that were not available in the GWAS input, but are available in the 1000G reference panel and are in LD with an independent significant SNP. Candidate SNPs can be filtered based on a user-defined minor allele frequency (MAF, ≥0.01 by default).

Based on the identified independent significant SNPs, independent lead SNPs are defined if they are independent from each other at r ²<0.1. Additionally, if LD blocks of independent significant SNPs are closely located to each other (<250kb based on the most right and left SNPs from each LD block), they are merged into one genomic locus. Each genomic locus can thus contain multiple independent significant SNPs and lead SNPs.

Besides using FUMA to determine lead SNPs based on GWAS summary statistics, users can provide a list of pre-defined lead SNPs. In addition, users can provide a list of pre-defined genomic regions to limit all annotations carried out by FUMA to those regions.

Annotation of candidate SNPs in genomic risk loci

Functional consequences of SNPs on genes are obtained by performing ANNOVAR^¹² (“gene-based annotation”) using Ensembl genes (build 85). Note that SNPs can be annotated to more than one gene in case of intergenic SNPs which are annotated to the two closest up- and down-stream genes. CADD scores, RegulomeDB scores and 15-core chromatin state are annotated to all SNPs in 1000G phase 3 by matching chromosome, position, reference, and alternative alleles. eQTLs are also extracted by matching chromosome, position and alleles of all independent significant SNPs and SNPs which are in LD with one of the independent significant SNPs for each user-selected tissue type, wherein SNPs can have multiple eQTLs for distinct genes and tissue types (Supplementary Note ²). Information on previously known SNP-trait associations reported in the GWAS catalog is also retrieved for all SNPs of interest by matching chromosome and position.

Gene mapping

Gene annotation is based on Ensembl genes (build 85). To match external gene IDs, ENSG ID is mapped to entrez ID yielding 35,808 genes which consist of 19,436 protein-coding genes, 9249 non-coding RNA, and other 7123 genes (e.g., pseudogenes, processed transcripts, immunoglobulin genes, and T-cell receptor genes).

Positional mapping is performed based on annotations obtained from ANNOVAR^¹². Two optional filters are provided to control the maximum distance from SNPs to genes and select specific functional consequences of SNPs on gene. When the former option is defined, FUMA maps SNPs to genes based on ANNOVAR annotation and a user-defined maximum distance is applied for intergenic SNPs. When the latter option is provided, FUMA maps only SNPs which have selected annotations annotated by ANNOVAR (e.g., coding or splicing SNPs).

For eQTL mapping, all independent significant SNPs and SNPs in LD of them are mapped to eQTLs in user-defined tissue types. By default, only significant SNP–gene pairs (false discovery rate (FDR)≤0.05) are used. Optionally, eQTLs can be filtered based on a user-defined P-value. eQTL mapping maps SNPs to genes up to 1Mb apart (cis-eQTLs).

Chromatin interaction mapping is performed by overlapping independent significant SNPs and SNPs in LD of them with one end of significantly interacting regions in user-selected tissue/cell types. These SNPs are then mapped to genes whose promoter regions (250bp up- and 500bp down-stream of transcription start site by default) overlap with another end of the significant interactions. Optionally SNPs can be filtered for those overlapping with predicted enhancer regions of the user-selected epigenomes. Similarly, mapped genes can also be filtered for having promoter regions overlap with predicted promoter regions of the user-selected epigenomes.

Optional filtering of SNPs based on functional annotations obtained in step 2 of SNP2GENE (i.e., CADD score, RegulomeDB score, 15-core chromatin state) can be performed for positional, eQTL and chromatin interaction mappings separately. When any of these filters is activated, candidate SNPs are filtered primary to gene mapping. Note that this filtering of SNPs based on functional annotations for a certain mapping does not affect other mappings, e.g., when SNPs are filtered by CADD score in positional mapping but not in eQTL mapping, SNPs are filtered prior to positional mapping but eQTL mapping uses the original set of candidate SNPs.

For mapped genes, two scores of intolerance to functional mutations are annotated; probability of being loss-of-function intolerant (pLI)^⁵⁸ and non-coding residual variation intolerance score (ncRVIS)^⁵⁹.

MAGMA for gene analysis and gene set analysis

FUMA uses input GWAS summary statistics to compute gene-based P-values (gene analysis) and gene set P-value (gene set analysis) using the MAGMA^³⁵ tool. For gene analysis, the gene-based P-value is computed for protein-coding genes by mapping SNPs to genes if SNPs are located within the genes. For gene set analysis, the gene set P-value is computed using the gene-based P-value for 4728 curated gene sets (including canonical pathways) and 6166 GO terms obtained from MsigDB v5.2. For both analyses, the default MAGMA setting (SNP-wise model for gene analysis and competitive model for gene set analysis) are used, and the Bonferroni correction (gene) or FDR (gene-set) was used to correct for multiple testing. 1000G phase 3^²⁷ is used as a reference panel to calculate LD across SNPs and genes.

GTEx gene expression data set

Normalized gene expressions (reads per kilo base per million, RPKM) of 53 tissue types were obtained from GTEx (Supplementary Table ³). A total of 56,320 genes was available in GTEx, which we filtered on an average RPKM per tissue greater than or equal to 1 in at least one tissue type. This resulted in transcripts of 28,520 genes, of which 22,146 were mapped to entrez ID (see “Gene Mapping” section for details). In the GENE2FUNC, the heatmap of prioritized genes displays two expression values; (i) the average log2(RPKM+1) per tissue per gene, in which RPKM is winsorized at 50, allowing comparison of expression level across genes and tissue types and (ii) the average of the normalized expression (zero mean of log2(RPKM+1)) per tissue per gene allowing comparison of expression level across tissue types within a gene.

To obtain differentially expressed gene sets (DEG; genes which are significantly more or less expressed in a given tissue compared to others) for each of 53 tissue type, the normalized expression (zero mean of log2(RPKM+1)) is used. Two-sided Student’s t-tests are performed per gene per tissue against all other tissues. After the Bonferroni correction, genes with corrected P-value <0.05 and absolute log fold change ≥0.58 are defined as a DEG set in a given tissue, i.e., for these gene expression in the given tissue had the largest discrepancy with expression in all other tissues. In addition, we distinguish between genes that are upregulated and downregulated in a specific tissue compared to other tissues, by taking the sign of t-score into account. In GENE2FUNC, genes are tested against those DEG sets by hypergeometric tests to evaluate if the prioritized genes (or a list of genes of interest) are overrepresented in DEG sets in specific tissue types.

Gene set enrichment test

To test for overrepresentation of biological functions, the prioritized genes (or a list of genes of interest) are tested against gene sets obtained from MsigDB (i.e., hallmark gene sets, positional gene sets, curated gene sets, motif gene sets, computational gene sets, GO gene sets, oncogenic signatures, and immunologic signatures) and WikiPathways, using hypergeometric tests. The set of background genes (i.e., the genes against which the set of prioritized genes are tested against) is 19,283 protein-coding genes. Background genes can also be selected from gene types as described in the “Gene Mapping” section. Custom sets of background genes can also be provided by the users. Multiple testing correction (i.e., Benjamini–Hochberg by default) is performed per data source of tested gene sets (e.g., canonical pathways, GO biological processes, hallmark genes). FUMA reports gene sets with adjusted P-value ≤0.05 and the number of genes that overlap with the gene set >1 by default.

FUMA parameters for application to GWAS summary statistics

In the described applications, three mapping strategies were applied to GWAS summary statistics with the following settings: positional mapping was performed by selecting exonic and splicing SNPs with CADD score ≥12.37 (defined by Kircher et al.^¹³) to restrict the mapping to deleterious coding SNPs. eQTL mapping was performed using GTEx eQTLs with FDR<0.05. Chromatin interaction mapping was performed using Hi-C data from Schmitt et al.^¹¹ and interactions were filtered by FDR<1e-6. Tissue types used for eQTLs and chromatin interaction mappings are described in the following section for each of three phenotypes. Other parameters not mentioned here were kept as default (Supplementary Table ²).

Application to BMI GWAS

Parameters were set as described in the above section and we used eQTLs in 44 tissue types from GTEx. For chromatin interaction mapping, Hi-C data of 14 tissue types (Adrenal, Aorta, Bladder, Dorsolateral Prefrontal Cortex, Hippocampus, Left Ventricle, Liver, Lung, Ovary, Pancreas, Psoas, Right Ventricle, Small Bowel and Spleen) from GSE87112 was used. Indels were excluded. rsID was mapped to dbSNP build 146 and chromosome and positions were extracted based on human genome hg19 reference. Only protein-coding genes were used in gene mapping and enrichment of DEG in 53 tissue types, Canonical Pathways and GO terms were tested.

Application to CD GWAS

We set parameters as described above and we used eQTLs in five tissue types from GTEx which are relevant to CD, i.e., Small Intestine, Colon Sigmoid, Colon Transverse, Stomach, and Whole Blood. Chromatin interaction mapping was performed using Hi-C data of two tissue types; Liver and Small Bowel from GSE87112. The MHC region and indels were excluded from the analysis. Since the input GWAS summary statistics only contained results from the discovery phase, we manually submitted the 71 reported lead SNPs to FUMA in addition to the independent lead SNPs that were identified as described above (Supplementary Data ⁷). Only protein-coding genes were used in mappings and enrichment of DEG in 53 tissue types, Canonical Pathways and GO terms were tested.

Application to SCZ GWAS

Parameters were set as described above and eQTLs in 10 brain tissues from GTEx. Chromatin interaction mapping was performed using Hi-C data of two brain regions; hippocampus and prefrontal cortex. The extended MHC region (25–34Mb), Chromosome X and indels were excluded from this analysis. The input GWAS summary statistics are based on the discovery phase and not all reported lead SNPs from the combined results of discovery and replication phases reached genome-wide significance. To include all reported lead SNPs, 111 non-indel lead SNPs were provided to FUMA and additional independent lead SNPs were identified at P<5e-8 (Supplementary Data ¹⁴). Only protein-coding genes were used in mappings and enrichment of DEG in 53 tissue types, Canonical Pathways and GO terms were tested.

Code availability

Source code of FUMA web application is available through a git repository at https://github.com/Kyoko-wtnb/FUMA-webapp/.

Data availability

Data and tools used in FUMA are all publicly available from the following links (details are in Supplementary Table ¹). dbSNP build 146 rsID archive: ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_grch137p13/database/organism_data/RsMergeArch.bcp.gz, 1000 genome phase 3 reference panel: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, CADD: http://cadd.gs.washington.edu/download, RegulomeDB: http://www.regulomedb.org/downloads, 15-core chromatin state: http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/, GWAS catalog: https://www.ebi.ac.uk/gwas/, GTEx v6: http://www.gtexportal.org/home/, Blood eQTL Browser: http://genenetwork.nl/bloodeqtlbrowser/, BIOS QTL Browser: http://genenetwork.nl/biosqtlbrowser/, BRAINEAC: http://www.braineac.org/, HiC (GSE87112): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87112, promoter/enhancer regions: http://egg2.wustl.edu/roadmap/data/byDataType/dnase/, pLI score: ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/functional_gene_constraint, ncRVIS score: http://journals.plos.org/plosgenetics/article/file?type=supplementary&id=info:doi/10.1371/journal.pgen.1005492.s011, MsigDB: http://software.broadinstitute.org/gsea/msigdb/, WikiPathways: http://wikipathways.org/index.php/WikiPathways, ANNOVAR: http://annovar.openbioinformatics.org/en/latest/, and MAGMA: https://ctg.cncr.nl/software/magma. GWAS summary statistics used in this study is available from the followings; BMI: http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files, CD: ftp.sanger.ac.uk/pub/consortia/ibdgenetics/, SCZ: http://www.med.unc.edu/pgc/results-and-downloads.

Electronic supplementary material

Supplementary Information^{(2.4M, docx)}

Peer review file^{(453K, pdf)}

Description of additional supplementary files^{(199K, pdf)}

Supplementary Data 1^{(58K, xlsx)}

Supplementary Data 2^{(51K, xlsx)}

Supplementary Data 3^{(64K, xlsx)}

Supplementary Data 4^{(156K, xlsx)}

Supplementary Data 5^{(103K, xlsx)}

Supplementary Data 6^{(35K, xlsx)}

Supplementary Data 7^{(42K, xlsx)}

Supplementary Data 8^{(56K, xlsx)}

Supplementary Data 9^{(49K, xlsx)}

Supplementary Data 10^{(66K, xlsx)}

Supplementary Data 11^{(123K, xlsx)}

Supplementary Data 12^{(109K, xlsx)}

Supplementary Data 13^{(83K, xlsx)}

Supplementary Data 14^{(46K, xlsx)}

Supplementary Data 15^{(68K, xlsx)}

Supplementary Data 16^{(56K, xlsx)}

Supplementary Data 17^{(67K, xlsx)}

Supplementary Data 18^{(129K, xlsx)}

Supplementary Data 19^{(50K, xlsx)}

Supplementary Data 20^{(45K, xlsx)}

Acknowledgements

This work was funded by The Netherlands Organization for Scientific Research (NWO VICI 453-14-005) and Ingrosyl. We thank the GIANT consortium, WTCCC and PGC for providing GWAS summary statistics and GTEx Portal for RNA-seq and eQTL data. We also thank Prof Patrick Sullivan for discussion of 3D chromatin interaction data.

Author contributions

D.P. conceived the study. K.W. and A.v.B. developed the web application. K.W. performed analyses and drafted the manuscript. K.W., E.T., and D.P. participated in the discussions, interpretation of the results, and editing of the manuscript. All authors provided relevant input at different stages of the project and approved the final manuscript.

Notes

Competing interests

The authors declare no competing financial interests.

Footnotes

Electronic supplementary material

Supplementary Information accompanies this paper at 10.1038/s41467-017-01261-5.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Welter D, et al. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. 10.1093/nar/gkt1229. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

2. Wood AR, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. 10.1038/ng.3097. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

3. Ripke S, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. 10.1038/nature13595. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

4. Okbay A, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539–542. 10.1038/nature17671. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

5. Sudlow C, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:1–10. 10.1371/journal.pmed.1001779. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

6. Breen G, et al. Translating genome-wide association findings into new therapeutics for psychiatry. Nat. Neurosci. 2016;19:1392–1396. 10.1038/nn.4411. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

7. Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. 10.1126/science.1222794. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

8. The GTEx Consortium The genotype-tissue expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science. 2015;348:648–660. 10.1126/science.1262110. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

9. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. 10.1038/nature11247. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

10. Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. 10.1038/nature14248. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

11. Schmitt AD, et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 2016;17:2042–2059. 10.1016/j.celrep.2016.10.061. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

12. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. 10.1093/nar/gkq603. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

13. Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014;46:310–315. 10.1038/ng.2892. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

14. Boyle AP, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790–1797. 10.1101/gr.137323.112. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

15. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods. 2012;9:215–216. 10.1038/nmeth.1906. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

16. Westra H-J, et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013;45:1238–1243. 10.1038/ng.2756. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

17. Zhernakova DV, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 2016;49:139–145. 10.1038/ng.3737. [Abstract] [CrossRef] [Google Scholar]

18. Ramasamy A, et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 2014;17:1418–1428. 10.1038/nn.3801. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

19. Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. OMIM.org: online mendelian inheritance in man (OMIM), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015;43:D789–D798. 10.1093/nar/gku1205. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

20. Wishart DS, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34:D668–D672. 10.1093/nar/gkj067. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

21. Liberzon A, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. 10.1093/bioinformatics/btr260. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

22. Kutmon M, et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 2016;44:D488–D494. 10.1093/nar/gkv1024. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

23. Edwards SL, Beesley J, French JD, Dunning M. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 2013;93:779–797. 10.1016/j.ajhg.2013.10.012. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

24. Marjoram P, Zubair A, Nuzhdin SV. Post-GWAS: where next? More samples, more SNPs or more biology? Heredity (Edinb.) 2014;112:79–88. 10.1038/hdy.2013.52. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

25. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am. J. Hum. Genet. 2012;90:7–24. 10.1016/j.ajhg.2011.11.029. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

26. Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. 10.1086/519795. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

27. Auton A, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. 10.1038/nature15393. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

28. McLaren W, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122. 10.1186/s13059-016-0974-4. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

29. Hou L, Zhao H. A review of post-GWAS prioritization approaches. Front. Genet. 2013;4:2009–2014. 10.3389/fgene.2013.00280. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

30. Gamazon ER, et al. SCAN: SNP and copy number annotation. Bioinformatics. 2010;26:259–262. 10.1093/bioinformatics/btp644. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

31. Ward LD, Kellis M. HaploReg v4: Systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 2016;44:D877–D881. 10.1093/nar/gkv1340. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

32. Tak YG, Farnham PJ. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenet. Chromatin. 2015;8:57. 10.1186/s13072-015-0050-4. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

33. Liu JZ, et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 2010;87:139–145. 10.1016/j.ajhg.2010.06.009. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

34. Mishra A, Macgregor S. VEGAS2: Software for more flexible gene-based testing. Twin Res. Hum. Genet. 2014;18:1–6. [Abstract] [Google Scholar]

35. de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 2015;11:1–19. 10.1371/journal.pcbi.1004219. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

36. Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLOS Comput. Biol. 2016;12:e1004714. 10.1371/journal.pcbi.1004714. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

37. Ayellet VS, Groop L, Mootha VK, Daly MJ, Altshuler D. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 2010;6:e1001058. 10.1371/journal.pgen.1001058. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

38. Lee PH, O’dushlaine C, Thomas B, Purcell SM. INRICH: Interval-based enrichment analysis for genome-wide association studies. Bioinformatics. 2012;28:1797–1799. 10.1093/bioinformatics/bts191. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

39. Pers TH, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 2015;6:5890. 10.1038/ncomms6890. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

40. Pruim RJ, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2011;27:2336–2337. [Europe PMC free article] [Abstract] [Google Scholar]

41. Cuellar-Partida G, Renteria ME, MacGregor S. LocusTrack: integrated visualization of GWAS results and genomic annotation. Source Code Biol. Med. 2015;10:1. 10.1186/s13029-015-0032-8. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

42. Wang, Y. et al. The 3D genome browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Preprint at http://biorxiv.org/content/early/2017/02/27/112268 (2017). [Europe PMC free article] [Abstract]

43. Locke AE, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. 10.1038/nature14177. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

44. Claussnitzer M, et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 2015;373:895–907. 10.1056/NEJMoa1502214. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

45. Sato M, Kawakami T, Kadota Y, Mori M, Suzuki S. Obesity and metallothionein. Curr. Pharm. Biotechnol. 2013;14:432–440. 10.2174/1389201011314040008. [Abstract] [CrossRef] [Google Scholar]

46. Raza S, et al. Association of glutathione-S-transferase (GSTM1 and GSTT1) and FTO gene polymorphisms with type 2 diabetes mellitus cases in Northern India. Balkan J. Med. Genet. 2014;17:47–54. [Europe PMC free article] [Abstract] [Google Scholar]

47. Franke A, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 2010;42:1118–1125. 10.1038/ng.717. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

48. Turner MD, Nedjai B, Hurst T, Pennington DJ. Cytokines and chemokines: at the crossroads of cell signalling and in fl ammatory disease. Biochim. Biophys. Acta. 2014;1843:2563–2582. 10.1016/j.bbamcr.2014.05.014. [Abstract] [CrossRef] [Google Scholar]

49. Braat H, Peppelenbosch MP, Hommes DW. Immunology of Crohn’s disease. Ann. N. Y. Acad. Sci. 2006;1072:135–154. 10.1196/annals.1326.039. [Abstract] [CrossRef] [Google Scholar]

50. Gusev, A.. et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Preprint at http://biorxiv.org/content/early/2016/08/02/067355 (2016). [Europe PMC free article] [Abstract]

51. Won H, et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature. 2016;538:523–527. 10.1038/nature19847. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

52. Martin LF, Freedman R. Schisophrenia and the alpha 7 nicotinic acetylcholine receptor. Int. Rev. Neurobiol. 2007;78:225–246. 10.1016/S0074-7742(06)78008-4. [Abstract] [CrossRef] [Google Scholar]

53. Freedman R. α 7-Nicotinic acetylcholine receptor agonists for cognitive enhancement in schizophrenia. Annu. Rev. Med. 2014;65:245–261. 10.1146/annurev-med-092112-142937. [Abstract] [CrossRef] [Google Scholar]

54. Mifsud B, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 2015;47:598–606. 10.1038/ng.3286. [Abstract] [CrossRef] [Google Scholar]

55. Martin P, et al. Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat. Commun. 2015;6:10069. 10.1038/ncomms10069. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

56. Mcgovern A, et al. Capture Hi-C identifies a novel causal gene, IL20RA, in the pan-autoimmune genetic susceptibility region 6q23. Genome Biol. 2016;17:212. 10.1186/s13059-016-1078-x. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

57. Promoters TG, et al. Lineage-specific genome architecture lnks enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167:1369–1384. 10.1016/j.cell.2016.09.037. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

58. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. 10.1038/nature19057. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

59. Petrovski S, et al. The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLoS Genet. 2015;11:1–25. 10.1371/journal.pgen.1005492. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

Articles from Nature Communications are provided here courtesy of Nature Publishing Group

Full text links

Read article at publisher's site: https://doi.org/10.1038/s41467-017-01261-5

Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/s41467-017-01261-5.pdf

Citations & impact

Impact metrics

1,639

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/29553475

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/29553475

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1038/s41467-017-01261-5

Supporting

Mentioning

Contrasting

2533

Article citations

Genetic basis of right and left ventricular heart shape.
Burns R, Young WJ, Aung N, Lopes LR, Elliott PM, Syrris P, Barriales-Villa R, Sohrabi C, Petersen SE, Ramírez J, Young A, Munroe PB
Nat Commun, 15(1):9437, 14 Nov 2024
Cited by: 0 articles | PMID: 39543113 | PMCID: PMC11564811
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Genome-wide meta-analysis of myasthenia gravis uncovers new loci and provides insights into polygenic prediction.
Braun A, Shekhar S, Levey DF, Straub P, Kraft J, Panagiotaropoulou GM, Heilbron K, Awasthi S, Meleka Hanna R, Hoffmann S, Stein M, Lehnerer S, Mergenthaler P, Elnahas AG, Topaloudi A, Koromina M, Palviainen T, Asbjornsdottir B, Stefansson H, [...] Ripke S
Nat Commun, 15(1):9839, 13 Nov 2024
Cited by: 0 articles | PMID: 39537604 | PMCID: PMC11560923
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Genome-wide meta-analysis conducted in three large biobanks expands the genetic landscape of lumbar disc herniations.
Salo V, Määttä J, Sliz E, FinnGen, Reimann E, Mägi R, Estonian Biobank Research Team, Reis K, Elhanas AG, Reigo A, Palta P, Esko T, Karppinen J, Kettunen J
Nat Commun, 15(1):9424, 07 Nov 2024
Cited by: 0 articles | PMID: 39511132 | PMCID: PMC11544010
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Leveraging ancient DNA to uncover signals of natural selection in Europe lost due to admixture or drift.
Pandey D, Harris M, Garud NR, Narasimhan VM
Nat Commun, 15(1):9772, 12 Nov 2024
Cited by: 0 articles | PMID: 39532856 | PMCID: PMC11557891
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Shared genetics between breast cancer and predisposing diseases identifies novel breast cancer treatment candidates.
Lalagkas PN, Melamed RD
Hum Genomics, 18(1):124, 14 Nov 2024
Cited by: 0 articles | PMID: 39538313 | PMCID: PMC11562851
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (1,639) article citations

Data

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioStudies: supplemental material and supporting data

http://www.ebi.ac.uk/biostudies/studies/S-EPMC5705698?xr=true

GEO - Gene Expression Omnibus

(2 citations) GEO - GSE87112

Funding

Funders who supported this work.

Dutch Research Council (NWO) (1)

Grant ID: 453-14-005
35 publications

Lundbeck Foundation (1)

Grant ID: R155-2014-1724
705 publications

GWAS	Risk loci	Reported genes in the original study	Positional mapping	eQTL mapping	Chromatin interaction mapping	Total^*	Genes located outside the risk loci	Novel candidates	Loci contain prioritized genes
BMI	77	117	23	144	310	400	263	263	67
CD	71	115	39	69	199	276	161	215	55
SCZ	109	349	36	54	33	113	26	35	45

GWAS	Risk loci	Reported genes in the original study	Positional mapping	eQTL mapping	Chromatin interaction mapping	Total^*	Genes located outside the risk loci	Novel candidates	Loci contain prioritized genes
BMI	77	117	23	144	310	400	263	263	67
CD	71	115	39	69	199	276	161	215	55
SCZ	109	349	36	54	33	113	26	35	45

Search life-sciences literature (45,104,145 articles, preprints and more)

Functional mapping and annotation of genetic associations with FUMA.

Author information

Affiliations

Authors

Authors

ORCIDs linked to this article

Abstract

Free full text

Functional mapping and annotation of genetic associations with FUMA

Kyoko Watanabe

Erdogan Taskesen

Arjen van Bochoven

Danielle Posthuma

Associated Data

Abstract

Introduction

Results

Overview of FUMA web application

FUMA covers various features of existing tools

Table 1

Application to GWAS of body mass index

Table 2

Application to Crohn’s disease GWAS

Application to schizophrenia GWAS

Discussion

Methods

Data pre-processing

Characterization of genomic risk loci based on GWAS

Annotation of candidate SNPs in genomic risk loci

Gene mapping

MAGMA for gene analysis and gene set analysis

GTEx gene expression data set

Gene set enrichment test

FUMA parameters for application to GWAS summary statistics

Application to BMI GWAS

Application to CD GWAS

Application to SCZ GWAS

Code availability

Data availability

Electronic supplementary material

Acknowledgements

Author contributions

Notes

Competing interests

Footnotes

References

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Data

Data behind the article

BioStudies: supplemental material and supporting data

GEO - Gene Expression Omnibus

Similar Articles

Funding

Dutch Research Council (NWO) (1)﻿

Lundbeck Foundation (1)﻿

Partnerships & funding

Dutch Research Council (NWO) (1)

Lundbeck Foundation (1)

GWAS	Risk loci	Reported genes in the original study	Positional mapping	eQTL mapping	Chromatin interaction mapping	Total^*	Genes located outside the risk loci	Novel candidates	Loci contain prioritized genes
BMI	77	117	23	144	310	400	263	263	67
CD	71	115	39	69	199	276	161	215	55
SCZ	109	349	36	54	33	113	26	35	45