Abstract
Summary
UV cross-linking and immunoprecipitation (CLIP), followed by high-throughput sequencing, is a powerful biochemical assay that maps in vivo protein-RNA interactions on a genome-wide scale. The CLIP Tool Kit (CTK) aims at providing a set of tools for flexible, streamlined and comprehensive CLIP data analysis. This software package extends the scope of our original CIMS package.Availability and implementation
The software is implemented in Perl. The source code and detailed documentation are available at http://zhanglab.c2b2.columbia.edu/index.php/CTK .Contact
[email protected].Free full text
CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data
Abstract
Summary
UV cross-linking and immunoprecipitation (CLIP), followed by high-throughput sequencing, is a powerful biochemical assay that maps in vivo protein-RNA interactions on a genome-wide scale. The CLIP Tool Kit (CTK) aims at providing a set of tools for flexible, streamlined and comprehensive CLIP data analysis. This software package extends the scope of our original CIMS package.
Availability and Implementation
The software is implemented in Perl. The source code and detailed documentation are available at http://zhanglab.c2b2.columbia.edu/index.php/CTK.
1 Introduction
Specific interaction of RNA-binding proteins (RBPs) with their target transcripts is essential for many steps of gene expression regulation. RBP interaction sites can be mapped on a genome-wide scale by UV cross-linking and immunoprecipitation of protein–RNA complexes, followed by high-throughput sequencing of the isolated RNA fragments (HITS-CLIP or CLIP-Seq) (Licatalosi et al., 2008). Since its initial development, HITS-CLIP and its variations have been applied in numerous studies (Darnell, 2010) and efforts have been made to compile published datasets (Yang et al., 2015). However, most studies implemented custom analysis tools optimized for a specific application. As a result, there remains a lack of software packages that are able to provide flexible, streamlined and comprehensive analysis of CLIP regardless of the CLIP protocol used. This gap imposes challenges for researchers who are new to CLIP, and raises issues with comparing and integrating results from different studies.
We previously developed the CIMS software package for processing CLIP data and mapping protein-RNA interactions at single nucleotide resolution (Moore et al., 2014). The latter takes advantage of crosslink-induced mutation sites (CIMS), which are nucleotide deletions or substitutions introduced at the protein–RNA crosslink sites by reverse transcriptase (Zhang and Darnell, 2011). Some variations of CLIP, such as iCLIP (Konig et al., 2010) and BrdU-CLIP (Weyn-Vanhentenryck et al., 2014), allow the capture of CLIP tags that are truncated at crosslink sites, and analysis of such crosslink-induced truncation sites (CITS) was also included in the CIMS package in later releases.
The CLIP Tool Kit (CTK), named to more precisely reflect the expansion of its scope to providing comprehensive CLIP data analysis, represents a major upgrade of the CIMS software package and has many advantages over existing CLIP data analysis software. Compared to the previous version of our analysis pipeline, CTK includes several algorithmic innovations, numerous optimizations and detailed documentation that significantly improve its performance and usability.
2 Software description
2.1 CLIP data preprocessing and mapping
CTK uses Burrows Wheeler Aligner (BWA) as the standard tool for read alignment. BWA allows the user to specify mismatch parameters by rate rather than by absolute number, which both simplifies and improves handling of CLIP tags of varying sizes. In addition, CTK operates on FASTQ files, to take advantage of sequence quality scores for read mapping, and on output SAM files, the standard format for storing read mapping information. Therefore, if desired, other aligners can also be used seamlessly for alignment.
CTK applies very stringent criteria to collapse PCR duplicates, which are distinguished by a random barcode (i.e., unique molecule identifier or UMI) attached to CLIP tags in most current CLIP protocols. After read mapping, a model-based algorithm is used to identify ‘sufficiently distinct’ barcodes among reads with the same chromosome starts by modeling the sequencing errors and the copy number of each duplicate sequence. Compared to the previous CIMS package, CTK uses a sparse data representation with greatly reduced memory usage and run time.
2.2 Identifying CLIP tag clusters and peak calling
Due to the increase in CLIP library complexity and sequencing depth, multiple CLIP tag clusters or peaks might not have clear separation, especially in abundant transcripts. To address this issue, CTK performs peak calling using a novel ‘valley seeking’ algorithm. In brief, CTK calculates the number of overlapping CLIP tags at each genomic position to find local maxima. Two neighboring local maxima with peak height h1 and h2 are considered to be two different peaks only when they are separated by a valley of depth d=h − v, where h=min(h1, h2) and v is the read coverage at the valley position. The user is asked to specify the relative valley depth (e.g. v/h≥0.9), so that the algorithm can accommodate transcripts of different abundance. To define a more stringent subset of CLIP tag peaks, CTK performs additional statistical assessment on whether the observed peak height is more than one would expect by chance using different background models and scan statistics.
2.3 CIMS and CITS analysis
CTK uses essentially the same statistical models for CIMS and CITS as the previous package to evaluate the reproducibility of candidate sites, but it includes several important optimizations. First, spurious mutations due to sequencing errors or low-quality mapping have been eliminated because CTK allows fewer mismatches for shorter reads. Second, because we noticed that crosslinking-induced deletions of multiple consecutive nucleotides are relatively common in CIMS analysis, and that these sites appear to show distinct properties compared to sites with single nucleotide deletions, CTK now identifies oligonucleotide deletions of different sizes and performs separate CIMS analyses.
We expect that these methods can be readily applied to data generated by different variations of CLIP. For example, CIMS analysis can be applied to PAR-CLIP data (Hafner et al., 2010), if one focuses on C→U transitions, and CITS analysis can be performed on data generated by BrdU-CLIP or iCLIP.
3 Results
We applied CTK to the Rbfox CLIP data derived from mouse brain tissues and human cells using different protocols (Van Nostrand et al., 2016; Weyn-Vanhentenryck et al., 2014) and found significant improvement compared to our previous package. Results from CTK gave a larger number of unique CLIP tags because we were able to retain shorter tags mapped with a smaller number of mismatches. In general, these shorter tags identified were reliable based on their genomic distribution and several other diagnostic measures.
We also compared CTK with several other software packages ((Clipper (Lovci et al., 2013), Piranha (Uren et al., 2012) and PIPE-CLIP (Chen et al., 2014)) for peak calling and identification of crosslink sites. For this comparison, we took advantage of the specific Rbfox binding motif, UGCAUG, which provides us with an objective measure of accuracy. CTK consistently achieved higher accuracy than the compared tools, as shown in the higher motif enrichment around peaks (Fig. 1A and B). Testing with more stringent valley depths also resulted in higher enrichment of UGCAUG, with little loss in sensitivity.
Funding
This work was supported by grants from the National Institutes of Health (NIH) (R00GM95713 and R01NS89676) and the Simons Foundation Autism Research Initiative (307711).
Conflict of Interest: none declared.
References
- Chen B. et al. (2014) PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis. Genome Biol., 15, R18. [Europe PMC free article] [Abstract] [Google Scholar]
- Darnell R.B. (2010) HITS-CLIP: panoramic views of protein–RNA regulation in living cells. Wiley Interdiscip. Rev. RNA, 1, 266–286. [Europe PMC free article] [Abstract] [Google Scholar]
- Hafner M. et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129–141. [Europe PMC free article] [Abstract] [Google Scholar]
- Konig J. et al. (2010) iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol., 17, 909–915. [Europe PMC free article] [Abstract] [Google Scholar]
- Licatalosi D.D. et al. (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature, 456, 464–469. [Europe PMC free article] [Abstract] [Google Scholar]
- Lovci M.T. et al. (2013) Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat. Struct. Mol. Biol., 20, 1434–1442. [Europe PMC free article] [Abstract] [Google Scholar]
- Moore M. et al. (2014) Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat. Protocols, 9, 263–293. [Europe PMC free article] [Abstract] [Google Scholar]
- Uren P.J. et al. (2012) Site identification in high-throughput RNA-protein interaction data. Bioinformatics (Oxford, England), 28, 3013–3020. [Europe PMC free article] [Abstract] [Google Scholar]
- Van Nostrand E.L. et al. (2016) Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods, 13, 508–514. [Europe PMC free article] [Abstract] [Google Scholar]
- Weyn-Vanhentenryck S. et al. (2014) HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell Rep., 6, 1139–1152. [Europe PMC free article] [Abstract] [Google Scholar]
- Yang Y.C. et al. (2015) CLIPdb: a CLIP-seq database for protein–RNA interactions. BMC Genomics, 16, 51. [Europe PMC free article] [Abstract] [Google Scholar]
- Zhang C., Darnell R.B. (2011) Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat. Biotechnol., 29, 607–614. [Europe PMC free article] [Abstract] [Google Scholar]
Articles from Bioinformatics are provided here courtesy of Oxford University Press
Full text links
Read article at publisher's site: https://doi.org/10.1093/bioinformatics/btw653
Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/bioinformatics/article-pdf/33/4/566/25416926/btw653.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1093/bioinformatics/btw653
Article citations
DHX36 binding induces RNA structurome remodeling and regulates RNA abundance via m<sup>6</sup>A reader YTHDF1.
Nat Commun, 15(1):9890, 15 Nov 2024
Cited by: 1 article | PMID: 39543097 | PMCID: PMC11564809
Selenocysteine tRNA methylation promotes oxidative stress resistance in melanoma metastasis.
Nat Cancer, 22 Oct 2024
Cited by: 0 articles | PMID: 39438623
Recruitment of the m6A/m6Am demethylase FTO to target RNAs by the telomeric zinc finger protein ZBTB48.
Genome Biol, 25(1):246, 19 Sep 2024
Cited by: 0 articles | PMID: 39300486 | PMCID: PMC11414060
Single-cell discovery of m6A RNA modifications in the hippocampus.
Genome Res, 34(6):822-836, 23 Jul 2024
Cited by: 0 articles | PMID: 39009472 | PMCID: PMC11293556
Mapping RNA-protein interactions with subcellular resolution using colocalization CLIP.
RNA, 30(7):920-937, 17 Jun 2024
Cited by: 1 article | PMID: 38658162
Go to all (90) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis.
Genome Biol, 15(1):R18, 22 Jan 2014
Cited by: 60 articles | PMID: 24451213 | PMCID: PMC4054095
Review Free full text in Europe PMC
PAR-CLIP for Discovering Target Sites of RNA-Binding Proteins.
Methods Mol Biol, 1720:55-75, 01 Jan 2018
Cited by: 11 articles | PMID: 29236251
omniCLIP: probabilistic identification of protein-RNA interactions from CLIP-seq data.
Genome Biol, 19(1):183, 01 Nov 2018
Cited by: 18 articles | PMID: 30384847 | PMCID: PMC6211453
Computational analysis of CLIP-seq data.
Methods, 118-119:60-72, 22 Feb 2017
Cited by: 24 articles | PMID: 28254606
Review
Funding
Funders who supported this work.
NHGRI NIH HHS (1)
Grant ID: R03 HG009528
NIGMS NIH HHS (3)
Grant ID: R00 GM095713
Grant ID: R01 GM124486
Grant ID: T32 GM008224
NINDS NIH HHS (1)
Grant ID: R01 NS089676
National Institutes of Health (2)
Grant ID: R00GM95713
Grant ID: R01NS89676
Simons Foundation (1)
Grant ID: 307711