Abstract
Free full text
mathFISH, a Web Tool That Uses Thermodynamics-Based Mathematical Models for In Silico Evaluation of Oligonucleotide Probes for Fluorescence In Situ Hybridization †
Abstract
Mathematical models of RNA-targeted fluorescence in situ hybridization (FISH) for perfectly matched and mismatched probe/target pairs are organized and automated in web-based mathFISH (http://mathfish.cee.wisc.edu). Offering the users up-to-date knowledge of hybridization thermodynamics within a theoretical framework, mathFISH is expected to maximize the probability of success during oligonucleotide probe design.
Fluorescence in situ hybridization (FISH) has been used for more than 2 decades in microbial ecology (since the publication of the article by DeLong et al. [5] in 1989) for the detection of whole cells of interest in environmental samples. Microbial ecological applications of FISH generally target rRNA as the phylogenetic marker and rely on the specific hybridization of labeled synthetic DNA probes with this complex molecule inside fixed cells. FISH offers unparalleled capabilities such as the quantification of target groups of organisms and visualization of their spatial distribution.
The main challenge in FISH applications is the optimization of sensitivity and specificity during probe design (23). Throughout the years, several experimental techniques have been developed for the optimization of probe performance, such as the use of formamide for mismatch discrimination (13, 19), the addition of unlabeled competitor oligonucleotides for improving mismatch discrimination (13), the empirical assessment of target accessibility (2, 7), the use of unlabeled helper oligonucleotides for improving accessibility (6), the amplification of signal using enzyme-labeled probes (catalyzed reporter deposition [CARD]-FISH) (16), and the use of cloned target for the determination of the formamide denaturation profiles (Clone-FISH) (18). Although highly useful, these tools have increased the complexity of probe design. In addition to the need for trial and error, experimental approaches without theoretical guidelines are limited by the physical availability of either organisms or clones belonging to target and nontarget microbial groups, making optimization cumbersome in some cases (24).
There is a plethora of literature on the universal physicochemical properties of nucleic acid hybridizations (e.g., see the review by Turner [22]), which can potentially help the prediction of probe performance in FISH. To facilitate the use of this knowledge in FISH applications, we have recently developed thermodynamics-based mathematical models of FISH (24-26). The models simulate FISH in silico using thermodynamically established parameters that define DNA/RNA interactions between the probe and targeted rRNA, DNA/DNA interactions within the probe, and RNA/RNA interactions within the rRNA.
Having been tested with model organisms (24-27), the mathematical models are expected to facilitate probe design. The models could potentially reduce the number of cycles of trial and error to reach the right probe and optimal hybridization conditions, provide a basis for evaluating the confidence level in experimental results (24), and serve as a substitute for experiments when the organisms of interest are not readily available. However, the expansion of the appropriate usage of these models within the scientific community requires automated tools, as personal communications and scientific collaborations have made clear. Thus, the goal of this study was to develop a web-based tool to provide a user-friendly environment for in silico simulations of FISH, a tool based on all available models.
Main modeling algorithm.
The main algorithm of mathFISH was established according to the thermodynamic models of FISH (25, 26). At the heart of FISH models is a hybridization scheme with three concurrent reactions, which is conveniently available to the users of mathFISH on the main page (http://mathfish.cee.wisc.edu/index.html). As shown by the flowchart in Fig. Fig.1,1, the user input to the algorithm describes probe and target sequences. The probe sequence (see below for entry options) is used to obtain the free energy change of probe/target duplex formation (designated ΔG°1 according to the hybridization scheme) from published thermodynamic parameters characterizing nearest neighbors of the duplex. The probe sequence is also submitted to the UNAFold software (15), which performs in silico folding and provides the free energy change of the formation of probe structure (defined as ΔG°2 in the hybridization scheme) (Fig. (Fig.11).
Inputting the target includes not only the target sequence but also the type of molecule and the domain of life (Fig. (Fig.1).1). This is because if the target is an rRNA molecule, it first needs to be aligned with a reference organism for the determination of the structural domain harboring the target site (26). For both small-subunit (SSU) and large-subunit (LSU) rRNA selections, the reference organisms used by mathFISH are Escherichia coli (in the Bacteria domain), Methanosarcina barkeri (in the Archaea domain), and Saccharomyces cerevisiae (in the Eukarya domain). Based on the alignment with the reference, the structural domain that encompasses the target site is excerpted and submitted to UNAFold for the calculation of the free energy penalty for probe binding (denoted as ΔG°3) (26) (Fig. (Fig.1).1). When the target sequence is not indicated as rRNA (e.g., mRNA sequences), the alignment step is skipped, and the folding to calculate ΔG°3 is done with the entire sequence (Fig. (Fig.11).
The rest of the algorithm is designed to generate the two main outputs of mathFISH (Fig. (Fig.1),1), namely, the probe affinity (ΔG°overall) calculated from the individual free energy values according to equilibrium thermodynamics (26) and the formamide dissociation profile of the probe derived from a linear free energy model (SmM)(25). Hybridization efficiency (fraction of probe-bound target molecules) is another key variable, which is calculated as a function of ΔG°overall and probe concentration (26) (not included in Fig. Fig.11).
Usage and tools.
The structure of mathFISH organizes mathematical modeling applications using eight tools (see Fig. S1 in the supplemental material). In all of these tools, the user entry is managed by text boxes for numerical values or sequence information and by radio buttons or scroll-down menus for selections (for an example page, see Fig. S1 in the supplemental material). The results are calculated and displayed after the “Submit” button is clicked (see below for examples). A comprehensive help document, linked to every window, is included to provide detailed guidelines of usage with illustrative examples. This documentation also includes a description of thermodynamic modeling concepts, and a “Recommendations” section, which helps the user interpret the results for probe design. To give a short description of the tool functions here, we will use probe S-*-Nsm6a-0192-a-A-20 (1) as an example. This probe was arbitrarily selected from ProbeBase (11).
(i) General analysis tool.
The general analysis tool performs all available simulations for a given probe/target (or probe/nontarget) sequence pair. The probe sequence can be entered directly as shown in Fig. S1 in the supplemental material or as target site sequence or target site positioning for the user's convenience. If the target site is not directly specified, the probe is first aligned with the provided target sequence to determine the best matching location. The target sequence can be entered in different formats, including plain text, FASTA, and GenBank (valid for all tools). The example in Fig. S1 in the supplemental material uses an arbitrary perfect match target from the Nitrosomonas genus (3) for the selected probe, which targets a specific Nitrosomonas oligotropha cluster (1). In addition to sequences, mathFISH requires temperature, salt strength, and probe concentration for thermodynamic evaluations (25, 26) (see Fig. S1 in the supplemental material).
Given the input in Fig. S1 in the supplemental material, the general analysis tool outputs a probe/target alignment as a check of sequence match (not shown), a tabulated list of calculated thermodynamic variables (see Fig. S2a in the supplemental material), a graphical display of hybridization efficiency and probe affinity (see Fig. S2b in the supplemental material), and a predicted formamide dissociation profile of the probe (not shown). The graphical representation of probe affinity includes color coding that reflects recommendations on how to interpret the calculated ΔG°overall value (Fig. S2b). The interpretation is based on the theoretical sigmoid-shaped relationship between probe affinity and hybridization efficiency (26). According to this scheme, if the ΔG°overall of the analyzed probe fits in the red zone, the binding efficiency is predicted to be too low for successful hybridization. The orange region shows the upper section of the transition from high to low hybridization efficiency, and theoretically, the orange region is an ideal zone where sensitivity and specificity would be optimized. However, since uncertainties in free energy calculations (26) can cause large differences in hybridization efficiency in the transition region, we recommend that probes with ΔG°overall values in the orange zone should also be avoided, if possible. In contrast, the green color indicates the recommended range of ΔG°overall values that provide a lower risk of false-negative results while specificity should still not be compromised (i.e., low risk of false-positive results) as the probe does not have excess affinity. The yellow zone, on the other hand, implies a higher risk of false-positive results due to the strong thermodynamic affinity that the probe might have for mismatched nontarget sequences. The given example falls in the recommended range of ΔG°overall values (i.e., the green zone in Fig. S2b in the supplemental material).
Another key output of the general analysis tool is the formamide melting point ([FA]m) (see Fig. S2a in the supplemental material). Since higher melting points imply excessive binding affinity, and hence, increased difficulty in mismatch discrimination, we recommend that this value be less than 55% formamide, to conform to typical formamide concentrations used in practice. This check is strongly recommended for probes in the yellow zone of the hybridization efficiency diagram in Fig. S2b in the supplemental material. The other outputs of the general analysis tool include the free energy values of individual reactions of FISH (Fig. S2a).
The general analysis tool may also be used to evaluate the hybridization potential of mismatched probe/nontarget pairs, but more specialized tools are also available for that type of analysis as explained next.
(ii) Mismatch analysis tool.
The mismatch analysis tool accepts a probe, a perfect matching target, and a mismatched nontarget sequence as input. Then, it outputs key variables for mismatch discrimination, which were systematically evaluated in a previous study (24). Figure S3a to c in the supplemental material show the results of an analysis with the example probe when a Nitrosomonas eutropha sequence is used as a one-mismatch nontarget. The probe alignments with target and nontarget sequences (see Fig. S3a in the supplemental material) serve as a confirmation of proper data entry. The tabulated data in Fig. S3b present the key variables for both probe/target and probe/nontarget hybridizations, of which two are worth specific attention when taken as a difference. One of the two key variables is ΔΔG°1, or the free energy penalty due to the insertion of mismatches in the duplex, and the other is Δ[FA]m, the difference in the melting point of perfect match and mismatched duplexes. The former represents mismatch stability (the larger the value, the less stable the mismatch, or the more destabilizing it becomes), and the latter indicates how much the mismatch shifts the formamide profile to the left, as illustrated graphically in the output (see Fig. S3c in the supplemental material).
Since thermodynamic parameters are not available for all mismatch conformations in DNA/RNA duplexes, mathFISH uses a specific method that sums the average of DNA/DNA and RNA/RNA conformations for the free energy of mismatch loops and DNA/RNA nearest neighbor parameters for broken base pairs, as explained by Yilmaz et al. (24). With this approach, mathFISH provides thermodynamic estimations for any type of mismatch, including bulged mismatches, and combinations of mismatches.
According to the evaluations of Yilmaz et al. (24), a ΔΔG°1 value of less than 2.0 kcal/mol or a Δ[FA]m of more than −20% (i.e., less than 20% in magnitude) should be considered a sign of weak mismatch discrimination ability. In this example, the ΔΔG°1 value of 2.3 kcal/mol is only slightly higher than the corresponding threshold, while the Δ[FA]m value of −13.8% is above its threshold. Thus, the user would be recommended by mathFISH to consider the use of an unlabeled competitor oligonucleotide to block the target site in this nontarget organism. Indeed, the actual probe is used together with a competitor for this type of nontarget (1), and hybridization under these conditions can be evaluated with the next tool.
(iii) Competitor analysis tool.
In addition to the entries in the mismatch analysis tool, the competitor analysis tool requests the sequence of the competitor oligonucleotide. To derive theoretical denaturation profiles for the target and nontarget organism (see Fig. S3d in the supplemental material), this tool employs an oligonucleotide competition model, which was previously used for quantifying the level of a specific probe competition (8), but not for obtaining a formamide series with unlabeled competitors (the details of our competition model are provided in help documents as with other models). In addition, a table of thermodynamic variables is generated for all four oligonucleotide-target combinations involved in modeling (not shown). The new profiles in Fig. S3d in the supplemental material show a clear improvement of mismatch discrimination ability over those in Fig. S3c. Another important aspect of competitor design is how much the competition lowers hybridization efficiency with the target organism, which is a moderate ~25% decrease in this case (based on the height of upper plateaus in the profiles of Fig. S3c and S3d). Although cross hybridization with the nontarget is theoretically prevented by the competitor (Fig. S3d), a high level of stringency is still recommended to minimize the risk of false-positive results. Thus, the formamide profiles derived for the example probe (Fig. S3c and S3d) suggest an optimum range of 30 to 40% formamide, where sensitivity (high hybridization efficiency with the target) would be maintained, while cross hybridization with mismatched nontargets is expected to be minimal. In this case, the prediction agrees with the reported experimental optimum formamide concentration at 35% (1), thus exemplifying how potential cycles of trial and error during optimization may be minimized by prior theoretical evaluations.
(iv) Other tools.
Not shown here in detail, other tools of mathFISH perform tasks with multiple probe/target pairs in a single event. A set of four tools are designed to generate free energy series for the four different free energy values (ΔG°1, ΔG°2, ΔG°3, and ΔG°overall) by walking a perfect match oligonucleotide of constant length over a target sequence. Since the total number of probes to be analyzed at a time is limited by computational speed, these tools can be used only to examine a frame of a target sequence (50 to 100 nucleotides), rather than the whole sequence, to locate the best spots for probe design. In addition, a formamide curve generator tool derives formamide profiles for up to five probe/target (or nontarget) pairs manually entered by the user. With some user creativity, these tools may accelerate the design process. For instance, a ΔG°2 series can help locate and eliminate stem-loop regions that will never allow proper design due to high self complementarity of probes. The usage and function of all tools are described with illustrative examples in the help documents of mathFISH.
As a final note, we believe that the most effective use of mathFISH for probe design is by a combination of experimental and in silico evaluations, since there are two critical components of probe design that benefit from a structured theoretical analysis. First, the determination of organisms with the highest risk of false-positive results can be evaluated in silico with mathFISH from potentially large sets of targets with one or two mismatches to the probe. Such evaluation would also facilitate decisions on which potential false positives could be eliminated with the help of competitor probes. Second, the design of the exact probe sequence can take advantage of the most up-to-date mathematical models and thermodynamic parameters so as to maximize the probability of successful optimization without many experimental cycles of trial and error. Finally, if target or nontarget sequences are not available as clones or isolates, mathFISH can be used to theoretically design and optimize multiple probes for a target (with necessary competitors). In that case, the combined use of multiple probes can potentially increase the confidence level (17) and partially compensate for the lack of rigorous experimental tests. Another instance where mathFISH can provide useful simulations is when because of the lack of isolates, the optimization of formamide concentrations is based on morphological observations within mixed cultures (for an example, see reference 4).
Originality of mathFISH.
The supporting software (UNAFold [15]) and parameter database (20) of mathFISH are publicly available, as well as the description of the mathematical models used (8, 24-26). Thus, mathFISH serves two main purposes: (i) to establish an organized framework for the mathematical FISH models developed and (ii) to automate the usage of this framework. Since the models are specialized models for FISH, other software that simulate association and dissociation of nucleic acid molecules cannot perform the same functions of mathFISH. For example, DINAMELT (14) offers thermal denaturation of DNA/DNA or RNA/RNA duplexes, but such simulations do not address important components of FISH (e.g., formamide denaturation, DNA/RNA associations, effect of competitors, etc.). To our knowledge, mathFISH is the only provider of theoretically based FISH simulations.
Other probe support software used in designing FISH probes include ARB (12) and probeCheck (10), which are powerful in comparative sequence analyses to locate the best target sites, but the predictors of probe performance in these tools (e.g., dissociation temperature) do not yet include relevant thermodynamic parameters and mathematical models for FISH. On the other hand, ARB brings the additional advantage of utilizing accessibility data available for rRNA (9). Thus, for an effective design and optimization approach, we recommend that the users of other probe support software first obtain a good set of probe candidates and then use mathFISH for the thermodynamic evaluation of the sensitivity and specificity as explained here.
Limitations.
Currently, the main computational limitation of mathFISH is the lack of high-throughput analysis tools for large sets of probe/target pairs (or probe/nontarget pairs), which could potentially automate the probe design process. This is a memory and processing speed problem, which may change in the future. However, we do offer multiple probe testing options in a limited format, as explained above. A related issue is the presence of ambiguities in probe sequences. Currently, mathFISH rejects IUPAC codes that represent multiple nucleotides in probe sequences (e.g., W = A or T), and therefore, the different probe/target (or probe/nontarget) pairs resulting from the combination of these must be analyzed separately by the user. To facilitate this process, mathFISH brings up a table of IUPAC codes whenever degenerate positions are detected in the probe sequence (or target site entry, if that is an option). Ambiguities in the target sequence, on the other hand, are allowed but are either considered unknown mismatches (if within the target site) or ignored during thermodynamic calculations (elsewhere in the sequence), and thus, results should be handled with care if non-ACGT characters (all U's are converted to T's per the convention of the databases and tools used) are present within the vicinity of the target site. Some non-ACTG characters that may be in the input sequence (i.e., space, gap, and numbers) are eliminated automatically from target sequences (mathFISH does not need a gap in the input sequences to describe a bulged mismatch due to the embedded probe alignment function).
Computational base of mathFISH.
Currently, mathFISH runs on a Red Hat Enterprise Linux 5 server with 2-gigabyte random-access memory (RAM) and 3.4-GHz processor. The hardware will be upgraded depending on the intensity of usage. The virtual engine of mathFISH is based on a package of MATLAB (The MathWorks, Inc., Natick, MA) functions wrapped into Java classes using MATLAB Builder JA. In addition to carrying out modeling calculations, the MATLAB-based engine employs the Clustal W alignment tool (21) for sequence alignment and UNAFold (15) to derive the free energy changes associated with intramolecular structures and mismatched conformations (see above). The web interface of mathFISH was developed using HTML (hypertext markup language) and JSP (Java Server Pages), and was deployed on an Apache Tomcat 6.0 server (http://tomcat.apache.org/). User input forms were designed using HTML pages which pass the inputs to JSP for processing. The JSP pages internally call the “virtual engine” for actual processing. The JSP pages then display the textual results in the form of an HTML and plot the required graphs as a PNG (portable network graphics) image by sending the numerical output to Google Charts (http://code.google.com/apis/chart/).
Conclusions.
The web-based tool mathFISH, freely available at http://mathfish.cee.wisc.edu, was developed to provide a user-friendly environment for the simulation of RNA-targeted FISH with oligonucleotide probes. The computational framework of mathFISH encompasses FISH-specific mathematical models that use up-to-date thermodynamic parameters for nucleic acid interactions. Thus, mathFISH is proposed as a sophisticated computational probe evaluation tool for FISH, which can help with the design of probes and experimental optimization methods. Using mathFISH for design and optimization can potentially minimize the time spent with wet-lab cycles of trial and error by maximizing the probability of successful hybridizations with respect to probe sensitivity and specificity.
Acknowledgments
This research was supported by National Science Foundation grant CBET-0606894. We were also supported by the Office of Science, Department of Energy, under grant DE-FG02-07ER64495.
Footnotes
Published ahead of print on 10 December 2010.
†Supplemental material for this article may be found at http://aem.asm.org/.
REFERENCES
Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)
Full text links
Read article at publisher's site: https://doi.org/10.1128/aem.01733-10
Read article for free, from open access legal sources, via Unpaywall: https://aem.asm.org/content/aem/77/3/1118.full.pdf
Free to read at aem.asm.org
http://aem.asm.org/cgi/content/abstract/77/3/1118
Free after 4 months at aem.asm.org
http://aem.asm.org/cgi/reprint/77/3/1118
Free after 4 months at aem.asm.org
http://aem.asm.org/cgi/content/full/77/3/1118
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Discover the attention surrounding your research
https://www.altmetric.com/details/141267063
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1128/aem.01733-10
Article citations
Unique episymbiotic relationship between Candidatus Patescibacteria and Zoogloea in activated sludge flocs at a municipal wastewater treatment plant.
Environ Microbiol Rep, 16(5):e70007, 01 Oct 2024
Cited by: 0 articles | PMID: 39267333 | PMCID: PMC11393006
Increased biofilm formation in dual-strain compared to single-strain communities of Cutibacterium acnes.
Sci Rep, 14(1):14547, 24 Jun 2024
Cited by: 0 articles | PMID: 38914744 | PMCID: PMC11196685
Integrating depth-dependent protist dynamics and microbial interactions in spring succession of a freshwater reservoir.
Environ Microbiome, 19(1):31, 08 May 2024
Cited by: 1 article | PMID: 38720385 | PMCID: PMC11080224
Zoothamnium mariella sp. nov., a marine, colonial ciliate with an atypcial growth pattern, and its ectosymbiont Candidatus Fusimicrobium zoothamnicola gen. nov., sp. nov.
PLoS One, 19(4):e0300758, 01 Apr 2024
Cited by: 0 articles | PMID: 38557976 | PMCID: PMC10984469
Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes.
Nat Commun, 15(1):1027, 03 Feb 2024
Cited by: 1 article | PMID: 38310092 | PMCID: PMC10838309
Go to all (111) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Oligonucleotide probes for RNA-targeted fluorescence in situ hybridization.
Adv Clin Chem, 43:79-115, 01 Jan 2007
Cited by: 13 articles | PMID: 17249381
Review
Systematic evaluation of single mismatch stability predictors for fluorescence in situ hybridization.
Environ Microbiol, 10(10):2872-2885, 14 Aug 2008
Cited by: 20 articles | PMID: 18707615
Graphical representation of ribosomal RNA probe accessibility data using ARB software package.
BMC Bioinformatics, 6:61, 21 Mar 2005
Cited by: 22 articles | PMID: 15777482 | PMCID: PMC1274257
Review Free full text in Europe PMC