Abstract
Free full text
Large-scale gene function analysis with PANTHER Classification System
Associated Data
Abstract
PANTHER Classification System (www.pantherdb.org) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools to enable biologists to analyze large-scale genome-wide experimental data. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (Hidden-Markov Models, or HMMs). Genes are classified by their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology and PANTHER Protein Class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools to allow users to browse and query gene functions, and analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), besides the update of the data content, the website interface was redesigned to improve the user experience and analytical capability. This protocol will provide a detailed description of how to analyze genome-wide experimental data in the PANTHER Classification System.
INTRODUCTION
The PANTHER Classification System (www.pantherdb.org) is designed to be a comprehensive platform for the analysis of gene function on a genome-wide scale1. Although its initial development aimed to classify gene and protein functions2,3, it has evolved through the years to also serve as an online resource for experimental data analysis4. The easy-to-use user interface and timely user support have made PANTHER one of the most widely used online resources for gene function classification and genome-wide data analysis.
PANTHER was initially released in 2003, and was the first database to combine both phylogenetic and functional data to define protein subfamilies of shared function and sequences3. It was also the first database to associate ontology terms describing function to statistical models (HMMs), which can be used to assign genes -- based on sequence information alone -- to subfamilies and functional classes. The novel concept was to annotate not single genes one at a time, but rather subfamilies of related genes that are likely to share function. We demonstrated the accuracy and comprehensiveness of the classifications on the D. melanogaster genome5. In 2005, we began providing annotations of biochemical pathways, viewable with the new PANTHER Pathway Applet6,7. These pathways were curated from the literature by expert biologists and diagrams were drawn with CellDesigner, a pathway-editing tool using controlled graphical notations to represent pathway knowledge8. In 2006, the first version of gene analysis tools was released4. These were primarily for the analysis of gene expression data. The tools were also designed to handle gene list data from any genome wide experiments. In the 2013 release of PANTHER 8.0, the web interface was redesigned to integrate multiple tools into one user-friendly and flexible interface, so that users can easily access all the tools and choose among them to perform gene list analysis at the genome-wide level1.
Since the initial release in 2003, PANTHER has become one of the most popular online resources for genome wide data analysis. Currently, there are over 7000 registered users, 600–800 daily users (weekdays) and over 14,000 monthly users worldwide. These counts were based on unique Internet Protocol (IP) addresses accessing the site, so the actual number is even greater. PANTHER has been cited in over 3600 publications since 2003 based on Google Scholar, and is growing steadily. Users have successfully analyzed the data from gene expression 9,10, proteomics 11,12 and genome-wide association study (GWAS) experiments13,14 in a diverse area of research, such as cancer research 10,11, neurological disorder studies 15,16, autoimmune diseases 9 and cardiac disease studies14,17.
PANTHER is currently included in a number of large international consortia. PANTHER has been a member of the InterPro Consortium of protein annotation resources since 2005, so PANTHER annotations can be automatically generated from the highly-used InterProScan software18. More recently, PANTHER also became part of the Gene Ontology (GO) Consortium19–21. The phylogenetic-inferred curation paradigm has become part of the GO curation pipeline, and therefore, PANTHER annotation reflects a more up to date GO curation22. Finally, PANTHER Pathway is in the process of being integrated into Pathway Commons23.
The PANTHER System is constructed with three functional modules (Figure 1). The core module is a large protein library that contains all protein-coding genes from 82 organisms organized first into families based on sequence homology, and then into subfamilies based on their shared functions (often a group of orthologous genes). Each family or subfamily is represented by a statistical model (HMM) annotated with ontology terms (GO terms and PANTHER Protein Class terms). The second module is the PANTHER pathway module, which contains 176 expert curated pathways. All pathways are connected, through manual curation, to individual proteins in the protein library through the pathway components, and therefore they are also linked to the phylogenetic information and statistical models. The last module is the website tool suite that contains a collection of bioinformatics tools and software, enabling users to not only query the data and classify genes and proteins, but also visualize, analyze and interpret genome-wide scale experimental data in the context of the enriched data content in the first two modules.
PANTHER protein library
The core of the PANTHER system is a collection of phylogenetically-defined protein families and subfamilies generated by computational algorithms, and curated by expert biologist using an extensive software system for associating ontology terms1,3. The current release contains over 640K proteins from 82 genomes, of which 79 are from the Reference Proteome Project (http://www.ebi.ac.uk/reference_proteomes/) (Figure 1). UniProt IDs are used as primary protein identifiers. These proteins are representatives of their respective genes. Therefore, each gene is represented by only one protein. In addition, UniProt IDmapping (http://www.uniprot.org/mapping/) is used to map the primary protein IDs to other IDs from different databases and resources, which expands the capability of PANTHER to support a wider range of ID types (see Supported IDs in Box 1). The proteins are divided into 7729 families, each of which is represented by a phylogenetic tree, an HMM, and a multiple sequence alignment (MSA) (Figure 1). Protein family trees are constructed computationally from sequence data using a phylogenetic tree inference algorithm called GIGA24. Nodes in the tree, corresponding to common ancestors of extant family members, are annotated by expert biologists with their inferred GO terms and PANTHER protein class terms, based on experiments performed on extant proteins. Subfamilies are determined based on the tree structure and often define the orthologous group, especially in organisms in the Deuterostomia superphylum. The functional annotations are propagated from the annotated ancestral nodes to the subfamily nodes, each of which is also represented by an HMM to allow classification of newly discovered protein sequences (Figure 2A). In addition, the HMM library and the scoring pipeline allow users to analyze genome-wide data from any organisms outside of the 82 within the PANTHER system (see below). Phylogenetic tree and functional annotation at the ancestral nodes have great advantage in annotation of previously unclassified genes. The PANTHER phylogenetic trees are currently used by the Gene Ontology Consortium in its curation pipeline 22.
PANTHER pathway
The PANTHER pathway dataset uses controlled vocabulary and graphical notation to describe pathway knowledge7. The graphical representation of pathways is generated by CellDesigner, a pathway-editing tool8, and is compliant with the Systems Biology Graphical Notation Process Description standard (SBGN-PD) 25 (Figure 2B). Currently, there are 176 expert-curated pathways in PANTHER. The scope of the pathways is similar to those described in textbooks or review articles, e.g., glycolysis, the PDGF signaling pathway or the p53 pathway. Each pathway contains three key classes of data (Figure 1). First is the pathway component (or molecule class), which represents a specific class of molecules that play the same mechanistic role within a pathway. It can be a protein (e.g., PDGF, Jak), a DNA region, or a simple molecule (e.g., ATP or glucose). If a pathway component is a protein, gene or transcribed RNA, it is associated with the protein sequences in the PANTHER protein library through manual curation (Figure 1 and and2).2). The individual protein sequences are instances of the pathway component. In these cases, a pathway component is typically a group of homologous/orthologous proteins across various organisms that are involved in the same biochemical reaction within the pathway. The second class is reaction, which represents biochemical relationships among different pathway components. Since all PANTHER pathways are in compliance with SBGN-PD, a typical reaction is a transition of an input (or reactant) to an output (a product) controlled by a modifier (Figure 2B). The last class is the cell/tissue type and cellular component, which provides the location where the reaction occurs.
There are two key features of the PANTHER pathway data model. First, the association between the molecule classes and protein sequences links the pathway to the phylogenetic relationships and statistical models (HMMs) of the protein families. As a result, if a protein is associated with a pathway component via experimental evidence, its orthologs can be inferred to the same component based on the PANTHER phylogenetic tree. Second, PANTHER pathway supports community standards, and all pathways are available in SBML26, BioPAX27 and SBGN25 formats.
PANTHER tools
PANTHER provides a number of useful research tools, including the PANTHER HMM Scoring Tool, cSNP Analysis Tool, and Gene List Analysis Tool. Both the PANTHER HMM Scoring Tool and cSNP Analysis Tools have online and downloadable versions. The online versions allow only one protein sequence to be analyzed at a time, while the downloadable versions allow large batch analyses. A brief description of the downloadable version of PANTHER HMM Scoring Tool can be found in Box 2. The details of how to use these tools can be found in the PANTHER help page (http://www.pantherdb.org/help/PANTHERhelp.jsp) and the user manual. In this protocol, we will focus on the Gene List Analysis Tool. Sample gene list files can be found in the Supplement materials for testing.
The Gene List Analysis Tool can be accessed directly from the PANTHER home page (www.pantherdb.org) (Figure 3). There are several options for users to input a list of genes (and optionally quantitative data) for analysis. The PANTHER database uses some database identifiers as the primary IDs for each gene, and they are the most common option for input file (e.g., a UniProt identifier, an Entrez Gene identifier, or even a gene symbol). The PANTHER database also maps identifiers from a number of different sources for 82 different organisms using the UniProt IDmapping mechanism, and the identifiers in the user’s list are automatically mapped to the primary IDs in the PANTHER database (Figure 1). A total list of the supported IDs can be found in Box 1. A report is generated that lists not only the mapping of each gene, but also the identifiers that could not be mapped, if any. Since each gene in the PANTHER database belongs to a phylogenetic tree that is annotated with GO and PANTHER ontology terms and pathways, the mapped IDs from the gene list will inherit these annotations. It needs to be pointed out that, in a very rare case, a single gene symbol or its synonym can be mapped to more than one gene from the IDmapping. We are currently working with UniProt to improve such mappings. However, this is so rare that we don’t think it affects significantly to the results from the statistical tests. If the gene list is not from the 82 organisms, a user can still use the PANTHER tools, but they need to map each of their own identifiers to PANTHER identifiers first, using the downloadable PANTHER HMM scoring tool (Figure 1 and Box 2), and creating the Generic PANTHER Mapping File with two columns (the user’s ID, and the ID of the best PANTHER HMM hit) (see Box 1 for more details of the file format). Each gene in the list will inherit the same annotations as those stored in the PANTHER database for the given HMM.
Once the gene list is classified with those functional terms, it can be analyzed in three different ways.
Functional classification – This tool provides the functional classification results of the uploaded list, and displays them in either a gene list page or pie chart.
Statistical overrepresentation test - This tool is based conceptually on the simple binomial test described previously28. It compares a test gene list uploaded by the user to a reference gene list, and determines whether a particular class (e.g., a GO biological process or PANTHER pathway) of genes is over- or under- represented. A more detailed description of the tool can be found in Box 3.
Statistical enrichment test - This tool, based on the work by the PANTHER group29 and Lander’s group30 for two different genomic data analyses at about the same time, uses the Mann-Whitney test31 to determine if any ontology class or pathway has numeric values that are non-randomly distributed with respect to the entire list of values. The numerical data can be normalized raw readouts from the microarray experiments or, more commonly, the fold-change value for each gene in a differential expression experiment, or calculated p-values from GWAS experiment. A more detailed description can be found in Box 4.
It is worth pointing out that there are other similar tools to perform overrepresentation and enrichment tests, most notably, GSEA32 and DAVID33. Although all three tools use very similar statistical algorithms in the back end, there are some differences among them. GSEA requires download and installation, so its targeted users are bioinformaticians who are more computer-savvy. Both DAVID and PANTHER are online tools and are more appealing to bench biologists. Compared to DAVID, PANTHER has a few advantages. First, PANTHER is currently part of the GO consortium, and integrates more updated GO curation data with the tools. In fact, DAVID downloads PANTHER data and integrate them in its analysis tools. Second, PANTHER allows users to analyze genome data from 82 organisms using the online tool, and any other organisms using the PANTHER scoring tool. Third, the phylogenetic trees in PANTHER protein library allow users to make more accurate ortholog prediction, and thus greatly enhance the capability of the tools. One weakness of PANTHER is that it cannot analyze data against resources outside of PANTHER, such as KEGG and Reactome.
Workspace
The Workspace is a unique feature in PANTHER that allows users to store their gene lists that they generate for future analysis. Although users do not have to register to use the PANTHER system, registration is required in order to user the workspace. Registration is free. In any PANTHER gene list display page, the user can easily send the list to the Workspace (see below).
MATERIALS
Equipment
Laptop or desktop computer with internet connection. High-speed internet connection is highly recommended.
System Requirements
Operating System
Windows XP or Windows 7 for PC users
MacOS 10.5 or later for Mac users
Minimum of 2GB RAM recommended
Browser
Microsoft Internet Explorer 8 or later (recommended for PC users)
Safari version 5 (recommended for Mac users)
Firefox version 19
Google Chrome version 26
Java
Latest version of Java (can be downloaded from http://www.java.com/en/download/)
JavaScript, Java applets and cookies must be enabled in your browser
Java applet runtime parameters set to -ms128m -mx512m -Xss16m
PROCEDURE
Note: Sample gene list files can be found in the Supplement materials for you try the tools.
Access the PANTHER website by entering www.pantherdb.org in your web browser.
Prepare input file(s) according to option A if you are working with one of the 82 genomes in the PANTHER database, or option B if you are working with an organism other than the 82 genomes in the database.
Prepare input file(s) in simple text file (.txt or .tab) with gene or protein identifiers as the first column, and numeric values as the second column if you want to use the statistical enrichment test. Detailed instruction of file format and supported IDs can be found in Box 1. (Timing: 15–30 minutes)
Prepare input file(s) by mapping your sequence identifiers to the PANTHER HMM IDs using the procedure described in Box 2. (Timing: see Box 2 for details)
!! CRITICAL STEP The input file must be in simple text file format (.txt or .tab). It also must use IDs supported in the PANTHER system and the correct tab delimited format as described in Box 1.
Upload the gene list to the PANTHER tool system using option A if you have a small list for functional classification tool, option B if you have a large list and want to analyze using all the gene list analysis tools, or option C if you previously saved the list in the Workspace.
Paste the ID list prepared in Step 2 into the Enter ID box. Alternatively you can also type IDs, one per line, into the box (blue arrow in Figure 3). Please note that this type of ID upload only supports functional classification tools.
Upload the list file prepared in Step 2 by clicking the Browse button (red arrow in Figure 3), and follow the online instruction to locate the file.
If you have previously saved your list into the Workspace, you can use it by clicking the login link (green arrow in Figure 3), and follow the online instruction to locate the file in the Workspace. Please note that numeric values cannot be saved in the Workspace, and therefore this type of upload does not support the Statistical enrichment test.
Select a corresponding list type in order for the tool to work properly. There are three list types that are supported by the tools: ID List, Previously exported text search results, and PANTHER Generic Mapping File. See Box 1 for details of the list types.
Select an organism from the drop-down menu, which lists the 12 model organisms first, followed by the rest 70 organisms ordered alphabetically.
Note: There are two purposes to select an organism at this point. First, some identifiers, such as gene symbols, are not organism specific. By selecting an organism here, it ensures that the IDs are mapped to those in the organism you are interested in. Second, if the statistical overrepresentation test is selected, the default reference gene list is based on the selected organism.
In the Select Analysis box, select one of the following 4 analyses by clicking the radial button, and then click the submit button.
Select Functional classification viewed in gene list if you want to find out the functional classification of the genes in your list (Figure 4). See the next section for more details about how to interpret the results.
Select Functional classification viewed in pie chart to get the functional classification of the genes in your list displayed as a pie chart (Figure 5).
Select Statistical overrepresentation test.
On the next page (Figure 6), you can select additional gene lists for the analysis. A total of 4 test gene lists can be uploaded for this tool. To do so,
Click Browse button
Select the gene list from your computer
Select the organism. The default is the organism selected when the first gene list is uploaded.
Select the List type. The default is the one selected when the first gene list is uploaded.
Click Upload list.
After all lists are uploaded, click the Finish selecting lists button. The tool will take you to the next page, which allows you to make the following selections.
Select a reference list. The default is always the entire proteome of the organism selected above. If you choose to change it, click the Select reference list button.
You can then select a reference list from another organism, or upload your own using an interface similar to one to upload test gene list.
Select an ontology to analyze. There are five options.
PANTHER Pathway
GO molecular function
GO biological process
GO cellular location
PANTHER protein class
Multiple test correction (Bonferroni correction) is selected by default.
Click the Launch analysis button.
On the results page (Figure 7A), you can visualize the results with the following options:
You can export the result table in a tab-delimited file by clicking the Export results button.
You can also view the results in graphs by using the View drop-down menu.
If your analysis is done in pathway as shown here, you can click the pathway name and display the pathway diagram. The pathway components that have genes in your test list will be highlighted. The color of the highlighted component can be defined at the top of the page (Figure 7A, red circle). A total of 4 test lists can be analyzed and viewed at the same time.
Select Statistical enrichment test
After clicking the submit button, on the next page, select an ontology to analyze.
Click the Launch analysis button.
On the results page (Figure 8A), you can visualize the results with the following options:
You can export the result table in a tab-delimited file clicking the Export results button.
You can compare the distribution curve in graph view. To do so, you can check the box in front of the category or pathway of your interest, and then click the Graph selected categories button (Figure 8B).
If your analysis is done in pathway as shown here, you can click the pathway name and display the pathway diagram. The pathway components are colored in “heat-map” based on the input numeric values (Figure 8C). Click the Specify color ranges of pathway diagrams button on the result page to view or specify the color ranges.
Note: In order to use this tool, make sure that the uploaded gene list contains a second column with numerical values.
Timing
Step 1, launch website: instant.
Step 2A, prepare input file: 15–30 minutes
Step 2B, prepare input file by scoring the PANTHER HMM library: varies based on the number of sequences. On average, 180 sequences per hour, plus 1.5–3.5 hour tool installation.
Step 3–6, using online tools: 5 minutes.
TROUBLESHOOTING
Table 1 lists some of the common problems and the possible solutions.
Table 1
Step | Problem | Possible reasons and solutions |
---|---|---|
3 | Fail to upload the file | This is usually because the input file is in the wrong file format. Possible solutions: 1. Make sure that your file is in simple text format (.txt or .tab). 2 If you are uploading a file with numeric values for the enrichment test, make sure that the second column contains only numeric numbers. Any rows with no values should be removed instead of leaving it blank or mark it as “n/a”, etc. 3. Make sure that there is no blank rows in the first column. |
6A, 6Cvii, 6Dii. | IDs in the uploaded file don’t have a mapped ID in PANTHER | The current PANTHER data is based on the April 2012 release of Reference Proteome Project and its ID mapping. It is possible that a small fraction of the IDs may not map due to the outdated data either in the PANTHER database or your uploaded file. There is no solution to this. If you believe that you are using the current IDs, please do the followings: 1. Make sure that the IDs in the uploaded file are supported by PANTHER. Refer to Box 1 for Supported IDs. 2. IDs from certain database may contain a version number at the end of them, e.g., NP_000242.1. Do not include the version number “.1” in the ID, and just use NP_000242. 3. Send a feedback to gro.bdrehtnap@kcabdeef. |
6Cviiic, 6Diiic | Pathway Applet can not be launched to view the pathway diagrams | This is most likely that your Java plug-in is outdated. Read the system requirement and make sure that your computer has the most updated Java version. |
ANTICIPATED RESULTS
A. Functional classification tool viewed in gene list page
This tool returns the results as a gene list webpage (Figure 5). The page displays all the IDs from the uploaded gene list and their mapped PANTHER sequence IDs as well as ontology and pathway terms. The page contains the following information (Figure 5).
Gene ID – This is the identifier for genes in the PANTHER library. The format is as follows: organism|gene database source=gene id|protein database source=protein id. For example, HUMAN|ENSEMBL=ENSG00000111262|UniProtKB=Q09470 is a human se- quence, the gene sequence is from ENSEMBL with id ENSG00000111262, and the protein sequence is from UniProt with id Q09470.
Mapped IDs – IDs from the uploaded gene list that are mapped to the gene ids in the first column.
Gene Name/Gene Symbol – The Entrez gene definition and gene symbol.
PANTHER Family/Subfamily – The name and identifier of the PANTHER family or subfamily where the gene in the first column is in.
GO Molecular Function, Biological Process, Cellular Component: These are Gene Ontology terms from PANTHER GO Slim describing the function of the gene product.
PANTHER Protein Class – This is a PANTHER Index terms describing protein classes.
Pathway – Pathway and pathway component that are linked to the PANTHER subfamily in column 4. The subfamily is linked to the pathway component when at least one of its member genes is associated to the component directly by manual curation.
Species – The organism of the gene in Column 1.
Once you reach the gene list page, you can make following changes to view the results.
Sort the list – You can always sort the list by clicking on any of the underlined column names. A yellow triangle appears in front of the column name that you choose to sort. The orientation of the triangle indicates the sort is ascending or descending.
Customize columns – You can click on the “x” button next to the column names to collapse the column.
Converting a list to another list type – Select the genes you want to convert by clicking the checkboxes. The default is for all genes in the list.
Saving the list. Select the genes you want to save by clicking the check- boxes. The default is for all genes in the list. You can select one of the followings from the pull-down menu as the destination:
Workspace – You need to register to save data to the workspace. The registration is free. When you make this selection, a pop-up window will ask you to name the list and add any comments. The name and comments can be edited at any time in the future from the Workspace page. Once the gene list is saved in your workspace, it can be returned to at any time. Only the IDs are stored, and they are mapped to the internal PANTHER gene ids, so when you access a list in the future, all information will be updated and current.
Exporting a list to a file – The list will be exported as a tab-delimited file. You can now import the file into Excel or perform any post-processing you wish.
View the list as text on the website.
Use the pie chart view by clicking the colorful pie chart icon (see Glossary for the meaning of the abbreviations). See the next section below for details about how to interpret the pie chart.
B. Functional classification tool viewed in pie chart
This tool returns the results in a pie chart, which displays an overview of all ontology terms at the first (or most general) level within the same ontology (Figure 6). When a slice of the pie chart, which represents an ontology term, is clicked, a new pie chart will appear that contains its child ontology terms. Since one gene can be classified to more than one term, the pie chart is calculated based on the number of “hits” to the terms over the total number of class hits. Class hit means independent ontology terms. For example, if a gene is classified to 2 ontology terms that are not parent or child to each other, it counts as 2 class hits.
When you place the computer mouse pointer over a slice, the category name and a series of counts are displayed. In our example in Figure 6, the name is a GO term apoptosis (GO:0006915) followed by
the number of genes (70) from the uploaded list that are classified to the term apoptosis;
the percent (14%) of genes classified to apoptosis (70) over the total number of genes (500);
the percent (4.7%) of genes classified to this apoptosis (70) over total number of class hits (1494).
C. Statistical overrepresentation test
The results of this analysis tool are displayed in a table (Figure 7A). If one test gene list is uploaded, the table contains six essential columns of data:
The first column contains the name of the PANTHER classification category. If you are doing this analysis in pathways, you can click on the pathway name to view the corresponding pathway diagram.
The second column contains the number of genes in the reference list that map to this particular PANTHER classification category.
The third column contains the number of genes in your uploaded list that map to this PANTHER classification category.
The fourth column contains the expected value (see Box 3), which is the number of genes you would expect in your list for this PANTHER category, based on the reference list.
The fifth column has either a + or −. A plus sign indicates over-representation of this category in the test list: you observed more genes than expected based on the reference list (for this category, the number of genes in your list is greater than the expected value). Conversely, a negative sign indicates under-representation, i.e. fewer genes than expected.
The sixth column is the p-value as determined by the binomial statistic (see Box 3). This is the probability that the number of genes you observed in this category occurred by chance (randomly), as determined by your reference list. A small p-value indicates that the number you observed is significant and potentially interesting. A cutoff of 0.05 is recommended as a starting point.
If more than one test list is uploaded, columns 3 to 6 are repeated for each list.
From this result page, various statistics can be exported by using the drop-down menu next to the “Export results” button. The list of genes/proteins in any functional group can be viewed by clicking on the listed counts. When pathways are chosen as the functional categories, clicking on the pathway name brings up pathway diagrams colored according to preferences specified by the user (Figure 7B). The resulting pathway diagram can be exported as an image file (.png) by choosing “File -> Export image” from the Applet menu.
D. Statistical enrichment test
The returned results are displayed in a table with four essential columns of data (Figure 8A):
The first column contains the name of the PANTHER classification category. If you are doing this analysis in terms of pathways, you can click on the pathway name to view the pathway diagram. The genes in the pathway diagram are colored according to the numeric value provided in the uploaded gene list file, and the rules for this can be specified by clicking on the ‘Specify color ranges’ button.
The second column contains the number of genes that map to this particular PANTHER classification category.
The third column has either a + or −. A plus sign indicates that for this category, the distribution of values for your uploaded list is shifted towards greater values than the overall distribution of all genes that were uploaded. A negative sign indicates that the uploaded list is shifted towards smaller values than the overall list.
The fourth column contains the p-value as calculated from the Mann-Whitney U Test (Wilcoxon Rank-Sum test) (Box 4). A large p-value indicates that the genes for this category have a distribution that is similar to randomly choosing genes from the overall distribution. In other words, the values of the uploaded genes for this category have a similar distribution to the overall list of values that were input. A small, significant p-value indicates that the distribution for this category is non-random and different than the overall distribution. A cutoff of 0.05 is recommended as a starting point.
To have a visual representation of these distributions, select the checkboxes of the categories of interest, and click on the ‘Graph selected categories’ button. The graph will be displayed in a new window (Figure 8B). The x-axis is your uploaded value. The y-axis is the cumulative fraction. The blue curve is the overall distribution for all genes. The red curve is the selected functional category. In this case, it is PDGF signaling pathway. If you look at the data point x = −2.5, y is 0.3 for the red curve and 0.1 for the blue curve. This means that 30% of your uploaded genes have a value of −0.25 or smaller, but only 10% of the overall genes have a value of −0.25 or smaller. In other words, it shows that the distribution of the category tends to be smaller than the overall distribution. We find that this is critical for interpreting any deviation between the functional category distribution and the overall distribution.
The genes/proteins in each category can also be viewed from the output page by clicking on the listed counts. In addition, for pathways, clicking on the pathway name will bring up an interactive Java applet that colors the pathway using a “heat map” derived from the input values (Figure 8C). The resulting pathway diagram can be exported as an image file (.png) by choosing “File -> Export image” from the Applet menu.
Supplementary Material
sampleTestList_NP
sampleTestList_NP_500
Acknowledgements
The authors would like to thank the Reference Proteome team, especially Craig McAnulla and Maria Martin, for their support to provide up-to-date Reference Proteome dataset, and Yukiko Matsuoka and Katoh Manami from the SBI Japan for their support on CellDesigner and pathway file update. This work is supported by NIH/NIGMS GM081084 to P.D.T.]. Funding for open access charge: University of Southern California.
Glossary
BioPAX | Biological Pathway Exchange format |
BP | biological process or GO biological process |
CC | cellular component or GO cellular component |
cSNP | coding SNP |
HMM | Hidden Markov Model |
GO | gene ontology |
GWAS | genome-wide association study |
ID | identifier |
IP | internet protocol |
MF | molecular function or GO molecular function |
MSA | multiple sequence alignment |
NGS | next generation sequencing |
PANTHER | Protein Annotation through Evolutionary Relationship |
PC | protein class or PANTHER protein class |
PDGF | platelet-derived growth factor |
Png | portable network graphics |
SBGN | Systems Biology Graphical Notation |
SBGN-PD | Systems Biology Graphical Notation Process Description |
SBML | Systems Biology Markup Language |
SNP | single nucleoside polymorphism |
VEGF | vascular endothelial growth factor |
Footnotes
Competing financial interests
The authors declare that they have no competing financial interests.
REFERENCES
Full text links
Read article at publisher's site: https://doi.org/10.1038/nprot.2013.092
Read article for free, from open access legal sources, via Unpaywall: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519453
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1038/nprot.2013.092
Article citations
Use of a Novel Whole Blood Separation and Transport Device for Targeted and Untargeted Proteomics.
Biomedicines, 12(10):2318, 11 Oct 2024
Cited by: 0 articles | PMID: 39457630 | PMCID: PMC11504527
Identification of DNA methylation markers for age and Bovine Respiratory Disease in dairy cattle: A pilot study based on Reduced Representation Bisulfite Sequencing.
Commun Biol, 7(1):1251, 03 Oct 2024
Cited by: 0 articles | PMID: 39363014 | PMCID: PMC11450024
Secondary Transcriptomic Analysis of Triple-Negative Breast Cancer Reveals Reliable Universal and Subtype-Specific Mechanistic Markers.
Cancers (Basel), 16(19):3379, 02 Oct 2024
Cited by: 0 articles | PMID: 39409999 | PMCID: PMC11476281
Unveiling lignocellulolytic potential: a genomic exploration of bacterial lineages within the termite gut.
Microbiome, 12(1):201, 15 Oct 2024
Cited by: 0 articles | PMID: 39407345 | PMCID: PMC11481507
Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes.
Nat Neurosci, 27(10):1864-1879, 03 Oct 2024
Cited by: 0 articles | PMID: 39363051
Go to all (1,546) article citations
Other citations
Wikipedia
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Ensembl Genome Browser
- (2 citations) Ensembl - ENSG00000111262
Genes & Proteins (4)
- (2 citations) UniProt - Q09470
- (1 citation) UniProt - Q9N143
- (1 citation) UniProt - P00026
- (1 citation) UniProt - Q9VWP6
Quick GO
- (1 citation) GO - GO0006915
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0).
Nat Protoc, 14(3):703-721, 25 Feb 2019
Cited by: 736 articles | PMID: 30804569 | PMCID: PMC6519457
PANTHER version 10: expanded protein families and functions, and analysis tools.
Nucleic Acids Res, 44(d1):D336-42, 17 Nov 2015
Cited by: 543 articles | PMID: 26578592 | PMCID: PMC4702852
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.
Nucleic Acids Res, 41(database issue):D377-86, 27 Nov 2012
Cited by: 1105 articles | PMID: 23193289 | PMCID: PMC3531194
PANTHER: Making genome-scale phylogenetics accessible to all.
Protein Sci, 31(1):8-22, 25 Nov 2021
Cited by: 489 articles | PMID: 34717010 | PMCID: PMC8740835
Review Free full text in Europe PMC
Funding
Funders who supported this work.
NIGMS NIH HHS (2)
Grant ID: R01 GM081084
Grant ID: GM081084