ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

de Castro E; Sigrist CJ; Gattiker A; Bulliard V; Langendijk-Genevaux PS; Gasteiger E; Bairoch A; Hulo N

doi:10.1093/nar/gkl124

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

Langendijk-Genevaux PS ,

Gasteiger E ,

Bairoch A ,

Hulo N

Affiliations

1. Swiss Institute of Bioinformatics, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland.
Authors
de Castro E¹
(1 author)

ORCIDs linked to this article

Nucleic Acids Research, 01 Jul 2006, 34(Web Server issue):W362-5
https://doi.org/10.1093/nar/gkl124 PMID: 16845026 PMCID: PMC1538847

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

ScanProsite--http://www.expasy.org/tools/scanprosite/--is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules--context-dependent annotation templates--to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.

Free full text

Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue): W362–W365.

Published online 2006 Jul 14. https://doi.org/10.1093/nar/gkl124

PMCID: PMC1538847

PMID: 16845026

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins

Edouard de Castro,^1,^* Christian J. A. Sigrist,¹ Alexandre Gattiker,³ Virginie Bulliard,¹ Petra S. Langendijk-Genevaux,¹ Elisabeth Gasteiger,¹ Amos Bairoch,^1,² and Nicolas Hulo¹

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Abstract

ScanProsite—http://www.expasy.org/tools/scanprosite/—is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules—context-dependent annotation templates—to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.

INTRODUCTION

To predict protein function, assign family identity or detect remote homologues, searches against signature databases, also known as secondary databases (1), are essential. ScanProsite provides a web interface to identify protein matches against signatures from the PROSITE database (2).

The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns (regular expressions), used for short motif detection, or generalized profiles (weight matrices) for sensitive detection of larger domains. All signatures are built from manually derived alignments and are provided with extensive manually curated documentation (taxonomic occurrence, function, etc.). PROSITE signatures form a high quality collection and are closely tied to the UniProtKB/Swiss-Prot (3,4) annotation process.

In addition, each PROSITE profile is associated with a manually curated annotation template called ProRule (5). These rules are used internally, by the Swiss-Prot group, to automate PROSITE domain annotation of UniProtKB/Swiss-Prot entries. A number of ProRules define biologically meaningful information about specific residues within their associated domain. This positional information (derived from the mapping of significant residues to the profile) is provided in the form of contextual feature annotation blocks: certain conditions—a specific sequence, the presence of other features—must be fulfilled for the annotation block to be applied, hence for the feature to be predicted.

Consequently, those ProRules add pattern-like discriminativity with motif-specific information to their associated profile, allowing the detection of intra-domain features, such as active sites, binding sites or disulfide bridges. Combining the sensitivity of profiles with the specificity of motif detection enhances the accuracy of signature based functional predictions.

The ScanProsite rich result viewer detects those intra-domain features by evaluating—on the fly—associated ProRules on matched profiles and integrates them with the match results.

PROSITE is an InterPro (6) database member. The InterProScan tool (http://www.ebi.ac.uk/InterProScan/) can scan sequences against PROSITE signatures (‘ProfileScan’ application), but does not provide intra-domain feature detection, ‘rich’ graphical output and extensive scanning options including scans against custom user patterns.

DESIGN AND IMPLEMENTATION

ScanProsite was implemented in Perl, and is served through an Apache web server running on a UNIX operating system. Care was taken to ensure that all generated pages are fully standards-compliant (valid HTML 4.01 transitional). The pre-computed match database and the ProRule database are stored in a PostgreSQL database. The entire implementation is based on open source tools.

Program output can either be displayed in the web browser (interactive mode), or sent by email (batch mode). The rich view output mode uses standard DHTML (HTML, CSS, JavaScript), and no additional plugins are required.

Input form

The job submission starting page is located at http://www.expasy.org/tools/scanprosite/ on the ExPASy web server (7).

Here the user has to enter sequence and/or signature data, choose the output format, and specify various scan behavior options before launching the analysis job. The form usage is described at http://www.expasy.org/tools/scanprosite/scanprosite-doc.html.

User sequences and/or UniProtKB sequences/databases can be scanned against user patterns and/or PROSITE signatures/database. Multiple sequences and/or signatures can be submitted at once (maximum: 8 for scans against databases, otherwise 16). In this case, any specified signature will be searched against any specified sequence (logical OR).

For scans against protein databases, the search space can be reduced to specific taxa, and/or to entries containing a specific term in their description (UniProtKB DE line).

Protein databases can be randomized (reversed or shuffled) to evaluate user pattern significance (8). Pattern matching mode can be altered (see aforementioned online documentation page under ‘pattern matching mode’).

Users can choose to retrieve full sequences (in fasta format) of matched proteins together with signature match results.

For scans against the whole PROSITE signature database, ‘unspecific’ signatures with a high probability of occurrence (see PROSITE user manual at http://www.expasy.org/prosite/prosuser.html) can be excluded (this option is activated by default). Scans can also be restricted to patterns.

Users can also choose to include low level profile matches (with score smaller than the high confidence - level 0 - cut-off) in the output (this option is not activated by default).

To save bandwidth and computing power, there is a limit on the number of distinct matching sequences and total matches that can be displayed. Users can select the maximum of matching sequences (by default, not more than 5000 matches in 1000 sequences will be displayed). A maximum of 10 000 or 100 000 sequences (50 000 or 500 000 signature matches) can only be requested in the non-interactive ‘batch mode’ where results are sent to the user via email.

The non-interactive ‘batch mode’ is used automatically when an email address is specified.

Instructions for simple programmatic access (text/plain output) are available on demand (email: gro.ysapxe@etisorp).

Sequence analysis

Once the job has been submitted, the data are posted to an application that will make use of pre-computed matches whenever possible. Results are retrieved from an internal relational database that stores matches of all PROSITE signatures (except the ‘unspecific’ signatures with a high probability of occurrence) against the UniProt Knowledgebase, including additional splice variants, and PDB (9). For analyses where no pre-computed results are available (e.g. on user sequences, user patterns or ‘unspecific’ signatures) a real-time sequence analysis is performed using the ps_scan program (10). Users requiring high-throughput sequence analysis or who wish to use custom databases can download ps_scan stand-alone tool at ftp://ftp.expasy.org/databases/prosite/tools/ps_scan/.

Available results are pooled and saved (for 12 h) on the server, for later use by the rich viewer.

The detection of intra-domain features is performed on the fly by the rich viewer over the matching domains (and is therefore only available in the rich view output mode, see below). Profile match regions are evaluated for fulfillment of feature detection conditions specified in the associated ProRules. The features themselves are represented in the ProRule as UniProtKB/Swiss-Prot annotation blocks that are applied when detection conditions are fulfilled. The feature annotation blocks and their detection conditions are retrieved from an internal database storing a relational representation of the ProRules.

Detection conditions can be: specific amino acids inside the domain (regular expressions that can be grouped by logical operators), groups of conditions in which all conditions must be fulfilled (e.g. catalytic triad of trypsin protease).

To date, intra-domain feature detection is performed for 136 of the 595 PROSITE profiles (PROSITE release 19.20, of 07-Feb-2006). Predicted features include (Table 1): binding sites for chemical groups (such as Heme, ATP), glycosylation site, disulfide bridges, DNA binding sites, metal ion binding sites, post-translational modification sites (such as phosphotyrosine, phosphoserine) etc. The system can be easily extended to use other UniProtKB/Swiss-Prot feature types.

Table 1

UniProtKB/Swiss-Prot features that can be predicted through ScanProsite (PROSITE release 19.20, of 07-Feb-2006)

UniProtKB/Swiss-Prot feature key name^a	No. of profiles	Example (profile AC/ID)
ACT_SITE	26	PS50240/TRYPSIN_DOM
BINDING	8	PS51007/CYTC
CA_BIND	1	PS50222/EF_HAND_2
CARBOHYD	2	PS50015/SAP_B
DISULFID	45	PS50948/PAN
DNA_BIND	17	PS51063/HTH_CRP_2
METAL	24	PS50873/PEROXIDASE_4
MOD_RES	20	PS51149/GLY_RADICAL_2
NP_BIND	9	PS50936/ENGC_GTPASE
SITE	2	PS51062/RUNT
ZN_FING	6	PS50115/ARFGAP

^aSee UniProtKB/Swiss-Prot user manual at http://www.expasy.org/sprot/userman.html#FTID.

ScanProsite output

In interactive mode, once the analysis is completed, the results will be directly displayed in the selected output view mode inside user's web browser.

In batch mode, results in simple text format are emailed to the user-specified email address, together with a link to visualize the results, stored on the server, through the rich viewer. In rich view and simple html view modes, matching UniProt Knowledgebase protein entries are shown with a link to their ExPASy NiceProt view, their description and organism. Moreover, for entries that have associated PDB structures, there are links to interactive 3D views highlighting the match region over the 3D structure.

Text modes provide no links, but are optimal for copy/pasting. See http://www.expasy.org/tools/scanprosite/scanprosite-doc.html#output for details.

The rich view can be accessed, as a link, from any other views.

The rich view output mode provides an interactive match viewer and intra-domain feature predictor/viewer in both text and graphical forms. See http://www.expasy.org/tools/scanprosite/scanview-doc.html for details.

The graphical view is a symbolic representation of matches, inspired by SMART (11) and predicted features in a high quality image (Portable Network Graphics format) that can be used in presentations or papers (Figure 1).

An external file that holds a picture, illustration, etc.
Object name is gkl124f1.jpg

Figure 1

ScanProsite result page (rich view mode).

The view is generated by a specific web tool that uses saved results allowing delayed results examination for up to 12 h after the initial scan (useful in batch mode).

In addition to the match results, intra-domain feature evaluations (if any) are detailed. For each evaluated feature, its UniProtKB/Swiss-Prot feature key, boundaries, description, the detection conditions and the condition group (if any), are displayed. Features with fulfilled conditions are shown under ‘Predicted features’ and integrated into the graphical representation. Features with unfulfilled conditions are shown under ‘Absent feature’.

The rich view also provides interactive match and feature highlighting, on results of a single sequence scan, with most web browsers e.g. Mozilla, FireFox, Netscape, Opera, provided JavaScript is enabled (with Internet Explorer this JavaScript functionality is too slow and was disabled). Moving the cursor over a feature information line will highlight its position (green for predicted features, gray for absent features) in both the match sequence and the protein sequence. Moving the cursor over a match in the graphical view or the text view will highlight in yellow its position in the protein sequence.

CONCLUSIONS

We have described the current state of the ScanProsite tool. The new implementation—online since summer 2004—brings the use of pre-computed matches whenever possible, the detection of intra-domain biological features, a new graphical result representation and an interactive result viewer.

PROSITE, through ScanProsite, provides broad intra-domain feature prediction via a flexible context-dependent annotation transfer system. Associating domain detection with an automated annotation system can significantly increase functional predictive power of profiles.

Acknowledgments

The authors would like to thank Eric Jain for careful proof-reading. This work was supported by grant no. 3152A0-103922/1 from the Swiss National Science Foundation and by the Swiss Federal Government through the Federal Office of Education and Science. Funding to pay the Open Access publication charges for this article was provided by Swiss Institute of Bioinformatics (Swiss-Prot group), Switzerland.

Conflict of interest statement. None declared.

REFERENCES

1. Attwood T.K., Parry-Smith D.J. Introduction to Bioinformatics. Addison Wesley Longman; 1999. [Google Scholar]

2. Hulo N., Bairoch A., Bulliard V., Cerutti L., De Castro E., Langendijk-Genevaux P.S., Pagni M., Sigrist C.J.A. The PROSITE database. Nucleic Acids Res. 2006;34:D227–D230. [Europe PMC free article] [Abstract] [Google Scholar]

3. Bairoch A., Boeckmann B., Ferro S., Gasteiger E. Swiss-Prot: juggling between evolution and stability. Brief Bioinform. 2004;5:39–55. [Abstract] [Google Scholar]

4. Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34:D187–D191. [Europe PMC free article] [Abstract] [Google Scholar]

5. Sigrist C.J.A., De Castro E., Langendijk-Genevaux P.S., Le Saux V., Bairoch A., Hulo N. ProRule: a new database containing functional and structural information on PROSITE profiles. Bioinformatics. 2005;21:4060–4066. [Abstract] [Google Scholar]

6. Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., Bateman A., Binns D., Bradley P., Bork P., Bucher P., Cerutti L., et al. InterPro, progress and status in 2005. Nucleic Acids Res. 2005;33:D201–D205. [Europe PMC free article] [Abstract] [Google Scholar]

7. Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R.D., Bairoch A. ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003;31:3784–3788. [Europe PMC free article] [Abstract] [Google Scholar]

8. Hulo N., Sigrist C.J.A., Le Saux V., Langendijk-Genevaux P.S., Bordoli L., Gattiker A., De Castro E., Bucher P., Bairoch A. Recent improvements to the PROSITE database. Nucleic Acids Res. 2004;32:D134–D137. [Europe PMC free article] [Abstract] [Google Scholar]

9. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [Europe PMC free article] [Abstract] [Google Scholar]

10. Gattiker A., Gasteiger E., Bairoch A. ScanProsite: a reference implementation of a PROSITE scanning tool. Appl. Bioinformatics. 2002;1:107–108. [Abstract] [Google Scholar]

11. Letunic I., Copley R.R., Pils B., Pinkert S., Schultz J., Bork P. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 2006;34:D257–D260. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/nar/gkl124

Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/nar/article-pdf/34/suppl_2/W362/7622814/gkl124.pdf

Citations & impact

Impact metrics

869

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/3384763

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/3384763

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1093/nar/gkl124

Supporting

Mentioning

Contrasting

1075

Article citations

Genome-Wide Characterization and Expression Profiling of Phytosulfokine Receptor Genes (PSKRs) in Triticum aestivum with Docking Simulations of Their Interactions with Phytosulfokine (PSK): A Bioinformatics Study.
Khalil HB
Genes (Basel), 15(10):1306, 09 Oct 2024
Cited by: 0 articles | PMID: 39457430 | PMCID: PMC11507999
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Ubiquitin-like and ubiquitinylated proteins associated with the maternal cell walls of Scenedesmus obliquus 633 as identified by immunochemistry and LC-MS/MS proteomics.
Kowalczyk J, Kłodawska K, Zych M, Burczyk J, Malec P
Protoplasma, 04 Oct 2024
Cited by: 0 articles | PMID: 39365352
Comprehensive Annotation and Expression Profiling of C2H2 Zinc Finger Transcription Factors across Chicken Tissues.
Chen S, Jiang J, Liang W, Tang Y, Lyu R, Hu Y, Cai D, Luo X, Sun M
Int J Mol Sci, 25(19):10525, 30 Sep 2024
Cited by: 0 articles | PMID: 39408854 | PMCID: PMC11476951
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Identification and Functional Annotation of Hypothetical Proteins of Pan-Drug-Resistant Providencia rettgeri Strain MRSN845308 Toward Designing Antimicrobial Drug Targets.
Pal DC, Anik TA, Rahman AA, Mahfujur Rahman SM
Bioinform Biol Insights, 18:11779322241280580, 23 Sep 2024
Cited by: 0 articles | PMID: 39372506 | PMCID: PMC11452876
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Natural TCRs targeting KRASG12V display fine specificity and sensitivity to human solid tumors.
Bear AS, Nadler RB, O'Hara MH, Stanton KL, Xu C, Saporito RJ, Rech AJ, Baroja ML, Blanchard T, Elliott MH, Ford MJ, Jones R, Patel S, Brennan A, O'Neil Z, Powell DJ, Vonderheide RH, Linette GP, Carreno BM
J Clin Invest, 134(21):e175790, 17 Sep 2024
Cited by: 0 articles | PMID: 39287991 | PMCID: PMC11529987
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (869) article citations

Search life-sciences literature (45,103,589 articles, preprints and more)

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

Author information

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins

Edouard de Castro

Christian J. A. Sigrist

Alexandre Gattiker

Virginie Bulliard

Petra S. Langendijk-Genevaux

Elisabeth Gasteiger

Amos Bairoch

Nicolas Hulo

Abstract

INTRODUCTION

DESIGN AND IMPLEMENTATION

Input form

Sequence analysis

Table 1

ScanProsite output

CONCLUSIONS

Acknowledgments

REFERENCES

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Other citations

Wikipedia (2)

Similar Articles

Partnerships & funding