ChromHMM: automating chromatin-state discovery and characterization.

Ernst J; Kellis M

doi:10.1038/nmeth.1906

Abstract

No abstract provided.

Free full text

Nat Methods. Author manuscript; available in PMC 2013 Feb 20.

Published in final edited form as:

Nat Methods. 2012 Feb 28; 9(3): 215–216.

Published online 2012 Feb 28. https://doi.org/10.1038/nmeth.1906

PMCID: PMC3577932

NIHMSID: NIHMS441097

PMID: 22373907

ChromHMM: automating chromatin state discovery and characterization

Jason Ernst^1,^2,³ and Manolis Kellis^1,²

Author information Copyright and License information Disclaimer

The publisher's final edited version of this article is available at Nat Methods

See other articles in PMC that cite the published article.

Associated Data

Supplementary Materials: Supplementary Figures, Note, and Data.
NIHMS441097-supplement-Supplementary_Figures__Note__and_Data.pdf (777K)

Chromatin state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type specific activity patterns, and for interpreting disease-association studies^1-5. However, the computational challenge of learning chromatin state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise making it inaccessible to the wider scientific community. To address this challenge, we have developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.

ChromHMM is based on a multivariate Hidden Markov Model that models the observed combination of chromatin marks using a product of independent Bernoulli random variables², which enables robust learning of complex patterns of many chromatin modifications. As input, it receives a list of aligned reads for each chromatin mark, which are automatically converted into presence or absence calls for each mark across the genome, based on a Poisson background distribution. An optional additional input of aligned reads for a control dataset can be used to either adjust the presence or absence threshold, or as an independent input feature (Supplementary Note). Alternatively, the user can input files that contain calls from an independent peak caller. By default, chromatin states are analyzed at 200-base pair intervals that roughly approximate nucleosome sizes, but smaller or larger windows can be specified. We have also developed a new parameter initialization procedure that enables relatively efficient inference of comparable models across different numbers of states (Supplementary Note).

ChromHMM then outputs both the learned chromatin state model parameters and the state assignments for each genomic position. The learned emission and transition parameters are returned in both text and image format (Fig. 1), automatically grouping states with similar emission parameters or proximal genomic locations, although a user-specified reordering can also be used (Supplementary Fig. 1-2, Supplementary Note). ChromHMM enables the study of the likely biological roles of each chromatin state based on enrichment in diverse external annotations and experimental data, shown as heat maps and tables (Fig. 1), both for direct genomic overlap and at various distances from a state (Supplementary Fig. 3). ChromHMM also generates custom UCSC genome browser tracks⁶ showing the resulting chromatin state segmentation in dense view (single color-coded track), or expanded view (each state shown separately) (Fig. 1). All the files ChromHMM produces by default are summarized on a webpage that it also creates (Supplementary Data).

An external file that holds a picture, illustration, etc.
Object name is nihms-441097-f0001.jpg

Figure 1

Sample Outputs of ChromHMM

(a) Example of state annotation tracks produced from ChromHMM and visualized in the UCSC genome browser⁶, including a dense view of the segmentation as a single track (top), and an expanded view of the segmentation showing each state as a separate track (bottom). (b) Heat maps automatically produced by ChromHMM show emission (left) and transition (right) parameters. (c) Example heat map for state functional enrichments automatically generated by ChromHMM. The columns indicate the relative percentage of the genome represented by each state (first column) and relative fold enrichment for: RefSeq transcription start sites (TSS); CpG Islands; 2000 base pair intervals around the TSS; exons; genes; transcript end sites (TES); evolutionary conservation; and nuclear lamina associated regions (Supplementary Note). a-c. Example shown corresponds to a previous model learned across nine cell types³.

ChromHMM also enables the analysis of chromatin states across multiple cell types. When the chromatin marks are common across the cell types, a common model can be learned by a virtual ‘concatenation’ of the chromosomes of all cell types. Alternatively a model can be learned by a virtual ‘stacking’ of all marks across cell types, or independent models can be learned in each cell type. Lastly, ChromHMM supports the comparison of models with different number of states based on correlations in their emission parameters (Supplementary Fig. 4).

Supplementary Material

Supplementary Figures, Note, and Data

Click here to view.^{(777K, pdf)}

Acknowledgements

We thank members of the Massachusetts Institute of Technology Computational Biology group and B. Bernstein for useful discussions related to this work. The work was supported by a NSF Postdoctoral Fellowship 0905968 to JE and grants from the National Institute of Health (NIH 1-RC1-HG005334 and NIH 1 U54 HG004570).

Footnotes

The software is written in Java enabling it to be run on virtually any computer, and is freely available with further documentation at http://compbio.mit.edu/ChromHMM.

References

1. Day N, et al. Bioinformatics. 2007;23:1424–1426. [Abstract] [Google Scholar]

2. Ernst J, Kellis M. Nat Biotechnol. 2010;28:817–825. [Europe PMC free article] [Abstract] [Google Scholar]

3. Ernst J, et al. Nature. 2011;473:43–49. [Europe PMC free article] [Abstract] [Google Scholar]

4. Filion GJ, et al. Cell. 2010;143:212–224. [Europe PMC free article] [Abstract] [Google Scholar]

5. Roy S, et al. Science. 2010;330:1787–1797. [Europe PMC free article] [Abstract] [Google Scholar]

6. Kent WJ, et al. Genome Res. 2002;12:996–1006. [Europe PMC free article] [Abstract] [Google Scholar]

Full text links

Read article at publisher's site: https://doi.org/10.1038/nmeth.1906

Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc3577932?pdf=render

Citations & impact

Impact metrics

1,381

Citations

Jump to Citations

1

Data citation

Jump to Data

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/4602437

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/4602437

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1038/nmeth.1906

Supporting

Mentioning

Contrasting

27

2481

0

Article citations

Extracting Chromosome Structural Information as One-Dimensional Metrics and Integrating Them with Epigenomics.
Wang J, Chen H
Methods Mol Biol, 2856:433-444, 01 Jan 2025
Cited by: 0 articles | PMID: 39283467
Machine and Deep Learning Methods for Predicting 3D Genome Organization.
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG
Methods Mol Biol, 2856:357-400, 01 Jan 2025
Cited by: 1 article | PMID: 39283464
Review
Base-excision repair pathway shapes 5-methylcytosine deamination signatures in pan-cancer genomes.
Silveira AB, Houy A, Ganier O, Özemek B, Vanhuele S, Vincent-Salomon A, Cassoux N, Mariani P, Pierron G, Leyvraz S, Rieke D, Picca A, Bielle F, Yaspo ML, Rodrigues M, Stern MH
Nat Commun, 15(1):9864, 14 Nov 2024
Cited by: 0 articles | PMID: 39543136 | PMCID: PMC11564873
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Multiomic characterization of RNA microenvironments by oligonucleotide-mediated proximity-interactome mapping.
Tsue AF, Kania EE, Lei DQ, Fields R, McGann CD, Marciniak DM, Hershberg EA, Deng X, Kihiu M, Ong SE, Disteche CM, Kugel S, Beliveau BJ, Schweppe DK, Shechner DM
Nat Methods, 21(11):2058-2071, 28 Oct 2024
Cited by: 0 articles | PMID: 39468212
Distinct H3K9me3 heterochromatin maintenance dynamics govern different gene programmes and repeats in pluripotent cells.
Zhang J, Donahue G, Gilbert MB, Lapidot T, Nicetto D, Zaret KS
Nat Cell Biol, 31 Oct 2024
Cited by: 0 articles | PMID: 39482359

Go to all (1,381) article citations

Data

Data that cites the article

This data has been provided by curated databases and other sources that have cited the article.

ENCODE: Encyclopedia of DNA Elements

http://encodeproject.org/publications/aa02932b-39c5-43d8-9ee1-2059fb604a81/

Funding

Funders who supported this work.

NHGRI NIH HHS (5)

Grant ID: 1 U54 HG004570
1 publication
Grant ID: 1-RC1-HG005334
1 publication
Grant ID: RC1 HG005334
21 publications
Grant ID: U54 HG004570
29 publications
Grant ID: R01 HG004037
122 publications

Search life-sciences literature (45,104,145 articles, preprints and more)

ChromHMM: automating chromatin-state discovery and characterization.

Affiliations

Authors

Abstract

Free full text

ChromHMM: automating chromatin state discovery and characterization

Jason Ernst

Manolis Kellis

Associated Data