CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data.

Shah A; Qian Y; Weyn-Vanhentenryck SM; Zhang C

doi:10.1093/bioinformatics/btw653

CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data.

Shah A ¹,

Qian Y ¹,

Weyn-Vanhentenryck SM ¹,

Zhang C ¹

Affiliations

1. Department of Systems BiologyDepartment of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA.
Authors
Shah A¹
Qian Y¹
Weyn-Vanhentenryck SM¹
Zhang C¹
(4 authors)

Bioinformatics (Oxford, England), 01 Feb 2017, 33(4):566-567
https://doi.org/10.1093/bioinformatics/btw653 PMID: 27797762 PMCID: PMC6041811

Free full text in Europe PMC

Abstract

Summary

UV cross-linking and immunoprecipitation (CLIP), followed by high-throughput sequencing, is a powerful biochemical assay that maps in vivo protein-RNA interactions on a genome-wide scale. The CLIP Tool Kit (CTK) aims at providing a set of tools for flexible, streamlined and comprehensive CLIP data analysis. This software package extends the scope of our original CIMS package.

Availability and implementation

The software is implemented in Perl. The source code and detailed documentation are available at http://zhanglab.c2b2.columbia.edu/index.php/CTK .

Contact

[email protected].

Free full text

Bioinformatics. 2017 Feb 15; 33(4): 566–567.

Published online 2016 Nov 16. https://doi.org/10.1093/bioinformatics/btw653

PMCID: PMC6041811

PMID: 27797762

CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data

Ankeeta Shah, Yingzhi Qian, Sebastien M Weyn-Vanhentenryck, and Chaolin Zhang

Inanc Birol, Associate Editor

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Abstract

Summary

Availability and Implementation

The software is implemented in Perl. The source code and detailed documentation are available at http://zhanglab.c2b2.columbia.edu/index.php/CTK.

Go to:

1 Introduction

Specific interaction of RNA-binding proteins (RBPs) with their target transcripts is essential for many steps of gene expression regulation. RBP interaction sites can be mapped on a genome-wide scale by UV cross-linking and immunoprecipitation of protein–RNA complexes, followed by high-throughput sequencing of the isolated RNA fragments (HITS-CLIP or CLIP-Seq) (Licatalosi et al., 2008). Since its initial development, HITS-CLIP and its variations have been applied in numerous studies (Darnell, 2010) and efforts have been made to compile published datasets (Yang et al., 2015). However, most studies implemented custom analysis tools optimized for a specific application. As a result, there remains a lack of software packages that are able to provide flexible, streamlined and comprehensive analysis of CLIP regardless of the CLIP protocol used. This gap imposes challenges for researchers who are new to CLIP, and raises issues with comparing and integrating results from different studies.

We previously developed the CIMS software package for processing CLIP data and mapping protein-RNA interactions at single nucleotide resolution (Moore et al., 2014). The latter takes advantage of crosslink-induced mutation sites (CIMS), which are nucleotide deletions or substitutions introduced at the protein–RNA crosslink sites by reverse transcriptase (Zhang and Darnell, 2011). Some variations of CLIP, such as iCLIP (Konig et al., 2010) and BrdU-CLIP (Weyn-Vanhentenryck et al., 2014), allow the capture of CLIP tags that are truncated at crosslink sites, and analysis of such crosslink-induced truncation sites (CITS) was also included in the CIMS package in later releases.

The CLIP Tool Kit (CTK), named to more precisely reflect the expansion of its scope to providing comprehensive CLIP data analysis, represents a major upgrade of the CIMS software package and has many advantages over existing CLIP data analysis software. Compared to the previous version of our analysis pipeline, CTK includes several algorithmic innovations, numerous optimizations and detailed documentation that significantly improve its performance and usability.

Go to:

2 Software description

2.1 CLIP data preprocessing and mapping

CTK uses Burrows Wheeler Aligner (BWA) as the standard tool for read alignment. BWA allows the user to specify mismatch parameters by rate rather than by absolute number, which both simplifies and improves handling of CLIP tags of varying sizes. In addition, CTK operates on FASTQ files, to take advantage of sequence quality scores for read mapping, and on output SAM files, the standard format for storing read mapping information. Therefore, if desired, other aligners can also be used seamlessly for alignment.

CTK applies very stringent criteria to collapse PCR duplicates, which are distinguished by a random barcode (i.e., unique molecule identifier or UMI) attached to CLIP tags in most current CLIP protocols. After read mapping, a model-based algorithm is used to identify ‘sufficiently distinct’ barcodes among reads with the same chromosome starts by modeling the sequencing errors and the copy number of each duplicate sequence. Compared to the previous CIMS package, CTK uses a sparse data representation with greatly reduced memory usage and run time.

2.2 Identifying CLIP tag clusters and peak calling

Due to the increase in CLIP library complexity and sequencing depth, multiple CLIP tag clusters or peaks might not have clear separation, especially in abundant transcripts. To address this issue, CTK performs peak calling using a novel ‘valley seeking’ algorithm. In brief, CTK calculates the number of overlapping CLIP tags at each genomic position to find local maxima. Two neighboring local maxima with peak height h1 and h2 are considered to be two different peaks only when they are separated by a valley of depth d=h − v, where h=min(h1, h2) and v is the read coverage at the valley position. The user is asked to specify the relative valley depth (e.g. v/h≥0.9), so that the algorithm can accommodate transcripts of different abundance. To define a more stringent subset of CLIP tag peaks, CTK performs additional statistical assessment on whether the observed peak height is more than one would expect by chance using different background models and scan statistics.

2.3 CIMS and CITS analysis

CTK uses essentially the same statistical models for CIMS and CITS as the previous package to evaluate the reproducibility of candidate sites, but it includes several important optimizations. First, spurious mutations due to sequencing errors or low-quality mapping have been eliminated because CTK allows fewer mismatches for shorter reads. Second, because we noticed that crosslinking-induced deletions of multiple consecutive nucleotides are relatively common in CIMS analysis, and that these sites appear to show distinct properties compared to sites with single nucleotide deletions, CTK now identifies oligonucleotide deletions of different sizes and performs separate CIMS analyses.

We expect that these methods can be readily applied to data generated by different variations of CLIP. For example, CIMS analysis can be applied to PAR-CLIP data (Hafner et al., 2010), if one focuses on C→U transitions, and CITS analysis can be performed on data generated by BrdU-CLIP or iCLIP.

Go to:

3 Results

We applied CTK to the Rbfox CLIP data derived from mouse brain tissues and human cells using different protocols (Van Nostrand et al., 2016; Weyn-Vanhentenryck et al., 2014) and found significant improvement compared to our previous package. Results from CTK gave a larger number of unique CLIP tags because we were able to retain shorter tags mapped with a smaller number of mismatches. In general, these shorter tags identified were reliable based on their genomic distribution and several other diagnostic measures.

We also compared CTK with several other software packages ((Clipper (Lovci et al., 2013), Piranha (Uren et al., 2012) and PIPE-CLIP (Chen et al., 2014)) for peak calling and identification of crosslink sites. For this comparison, we took advantage of the specific Rbfox binding motif, UGCAUG, which provides us with an objective measure of accuracy. CTK consistently achieved higher accuracy than the compared tools, as shown in the higher motif enrichment around peaks (Fig. 1A and B). Testing with more stringent valley depths also resulted in higher enrichment of UGCAUG, with little loss in sensitivity.

An external file that holds a picture, illustration, etc.
Object name is btw653f1.jpg

Fig. 1.

Comparison of CTK and other software packages in CLIP data analysis. (A) Rbfox1-3 CLIP (mouse brain). (B) Rbfox2 eCLIP (HepG2). In each panel, CLIP tag peaks were called by different algorithms using varying thresholds. The fraction of peaks overlapping with the Rbfox binding motif site (UGCAUG in+/−50nt around peak center) is shown. Pipe-CLIP was not able to converge in our tests, and thus the results are not reported

Go to:

Funding

This work was supported by grants from the National Institutes of Health (NIH) (R00GM95713 and R01NS89676) and the Simons Foundation Autism Research Initiative (307711).

Conflict of Interest: none declared.

Go to:

References

Chen B. et al. (2014) PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis. Genome Biol., 15, R18. [Europe PMC free article] [Abstract] [Google Scholar]
Darnell R.B. (2010) HITS-CLIP: panoramic views of protein–RNA regulation in living cells. Wiley Interdiscip. Rev. RNA, 1, 266–286. [Europe PMC free article] [Abstract] [Google Scholar]
Hafner M. et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129–141. [Europe PMC free article] [Abstract] [Google Scholar]
Konig J. et al. (2010) iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol., 17, 909–915. [Europe PMC free article] [Abstract] [Google Scholar]
Licatalosi D.D. et al. (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature, 456, 464–469. [Europe PMC free article] [Abstract] [Google Scholar]
Lovci M.T. et al. (2013) Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat. Struct. Mol. Biol., 20, 1434–1442. [Europe PMC free article] [Abstract] [Google Scholar]
Moore M. et al. (2014) Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat. Protocols, 9, 263–293. [Europe PMC free article] [Abstract] [Google Scholar]
Uren P.J. et al. (2012) Site identification in high-throughput RNA-protein interaction data. Bioinformatics (Oxford, England), 28, 3013–3020. [Europe PMC free article] [Abstract] [Google Scholar]
Van Nostrand E.L. et al. (2016) Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods, 13, 508–514. [Europe PMC free article] [Abstract] [Google Scholar]
Weyn-Vanhentenryck S. et al. (2014) HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell Rep., 6, 1139–1152. [Europe PMC free article] [Abstract] [Google Scholar]
Yang Y.C. et al. (2015) CLIPdb: a CLIP-seq database for protein–RNA interactions. BMC Genomics, 16, 51. [Europe PMC free article] [Abstract] [Google Scholar]
Zhang C., Darnell R.B. (2011) Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat. Biotechnol., 29, 607–614. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/bioinformatics/btw653

Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/bioinformatics/article-pdf/33/4/566/25416926/btw653.pdf

Citations & impact

Impact metrics

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/12929959

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/12929959

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1093/bioinformatics/btw653

Supporting

Mentioning

Contrasting

162

Article citations

DHX36 binding induces RNA structurome remodeling and regulates RNA abundance via m<sup>6</sup>A reader YTHDF1.
Zhang Y, Zhao J, Chen X, Qiao Y, Kang J, Guo X, Yang F, Lyu K, Ding Y, Zhao Y, Sun H, Kwok CK, Wang H
Nat Commun, 15(1):9890, 15 Nov 2024
Cited by: 1 article | PMID: 39543097 | PMCID: PMC11564809
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Selenocysteine tRNA methylation promotes oxidative stress resistance in melanoma metastasis.
Nease LA, Church KP, Delclaux I, Murakami S, Astorkia M, Zerhouni M, Cascio G, Hughes RO, Aguirre KN, Zumbo P, Dow LE, Jaffrey S, Betel D, Piskounova E
Nat Cancer, 22 Oct 2024
Cited by: 0 articles | PMID: 39438623
Recruitment of the m⁶A/m6Am demethylase FTO to target RNAs by the telomeric zinc finger protein ZBTB48.
Nabeel-Shah S, Pu S, Burke GL, Ahmed N, Braunschweig U, Farhangmehr S, Lee H, Wu M, Ni Z, Tang H, Zhong G, Marcon E, Zhang Z, Blencowe BJ, Greenblatt JF
Genome Biol, 25(1):246, 19 Sep 2024
Cited by: 0 articles | PMID: 39300486 | PMCID: PMC11414060
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Single-cell discovery of m⁶A RNA modifications in the hippocampus.
Feng S, Tellaetxe-Abete M, Zhang Y, Peng Y, Zhou H, Dong M, Larrea E, Xue L, Zhang L, Koziol MJ
Genome Res, 34(6):822-836, 23 Jul 2024
Cited by: 0 articles | PMID: 39009472 | PMCID: PMC11293556
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Mapping RNA-protein interactions with subcellular resolution using colocalization CLIP.
Yi S, Singh SS, Rozen-Gagnon K, Luna JM
RNA, 30(7):920-937, 17 Jun 2024
Cited by: 1 article | PMID: 38658162

Go to all (90) article citations

Funding

Funders who supported this work.

NHGRI NIH HHS (1)

Grant ID: R03 HG009528
9 publications

NIGMS NIH HHS (3)

Grant ID: R00 GM095713
21 publications
Grant ID: R01 GM124486
26 publications
Grant ID: T32 GM008224
128 publications

NINDS NIH HHS (1)

Grant ID: R01 NS089676
17 publications

National Institutes of Health (2)

Grant ID: R00GM95713
2 publications
Grant ID: R01NS89676
1 publication

Simons Foundation (1)

Grant ID: 307711
3 publications

Search life-sciences literature (45,103,589 articles, preprints and more)

CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data.

Author information

Affiliations

Authors

Abstract

Summary

Availability and implementation

Contact

Free full text

CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data

Ankeeta Shah

Yingzhi Qian

Sebastien M Weyn-Vanhentenryck

Chaolin Zhang

Abstract

Summary

Availability and Implementation

1 Introduction

2 Software description

2.1 CLIP data preprocessing and mapping

2.2 Identifying CLIP tag clusters and peak calling

2.3 CIMS and CITS analysis

3 Results

Funding

References

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Similar Articles

Funding

NHGRI NIH HHS (1)﻿

NIGMS NIH HHS (3)﻿

NINDS NIH HHS (1)﻿

National Institutes of Health (2)﻿

Simons Foundation (1)﻿

Partnerships & funding

NHGRI NIH HHS (1)

NIGMS NIH HHS (3)

NINDS NIH HHS (1)

National Institutes of Health (2)

Simons Foundation (1)