Abstract
Free full text
Predicting the potential for zoonotic transmission and host associations for novel viruses
Abstract
Host-virus associations have co-evolved under ecological and evolutionary selection pressures that shape cross-species transmission and spillover to humans. Observed virus-host associations provide relevant context for newly discovered wildlife viruses to assess knowledge gaps in host-range and estimate pathways for potential human infection. Using models to predict virus-host networks, we predicted the likelihood of humans as hosts for 513 newly discovered viruses detected by large-scale wildlife surveillance at high-risk animal-human interfaces in Africa, Asia, and Latin America. Predictions indicated that novel coronaviruses are likely to infect a greater number of host species than viruses from other families. Our models further characterize novel viruses through prioritization scores and directly inform surveillance targets to identify host ranges for newly discovered viruses.
Introduction
Identifying zoonotic virus emergence events at the earliest possible stage is key to mitigating outbreaks and preventing future epidemic and pandemic threats. By the time novel viruses are recognized in humans, often as a cluster of unusual cases, public health interventions to prevent or contain an epidemic face major challenges. However, determining the potential zoonotic transmission for newly discovered animal viruses, in the absence of documented human infection, is currently a major scientific challenge. New approaches are needed to evaluate and characterize the risk of zoonotic transmission of newly discovered animal viruses in the face of very limited data. Here we analyze human, domesticated animal, and wildlife surveillance and viral discovery data collected from 2009 to 2019, as part of a consortium-led One Health project aimed at strengthening pandemic threat detection capabilities in Africa, Asia, and Latin America1. Surveillance efforts resulted in 944 novel monophyletic clusters of virus sequences in wildlife (referred to as novel viruses henceforth) from 18 virus families sampled at high-risk animal-human disease transmission interfaces in 34 countries. As none of these viruses have yet been identified in humans, other indices were previously established to assess potential risk, such as virus host range or plasticity, and expert opinion based on integration of ecological and molecular characteristics of viruses2–5. We were able to quantify the risk of zoonotic transmission for 531 out of 944 novel animal viruses using data driven models to predict host-virus networks.
Patterns observed across host-virus networks have been used to understand virus sharing among vertebrate species2,6,7, and predict cryptic links between mammalian, and avian hosts and their viruses8–10. Host-virus network linkages can be informed by virus traits, virus biogeography, host ecological niches, and propensity for host sharing among viruses10,11. Precedence in viral sharing among species and ecological opportunities for spillover, as characterized by network topology, can inform propensities for newly discovered viruses that lack data5. Further exploration of these networks can aid in estimating the host plasticity of viruses, an important characteristic associated with zoonotic potential2,5. Unfortunately, systematically collected surveillance data to parameterize and validate these models have been missing3. Here, we apply a network approach to gain ecological insights from viruses that have been shared among species in nature and inform potential virus-host associations and zoonotic risk of novel viruses recently discovered from wildlife.
Using data from the literature, we developed a network that included 269 known zoonotic and 307 non-zoonotic viruses infecting 885 avian and mammalian hosts (
Results and discussion
Virus-host network for known viruses ( )
We developed a unipartite network with viruses as nodes and host species as edges for all species recognized as a host for viruses based on data presented in previous studies and databases, specifically, data shared by Olival et al.,4 Pandit et al.,3 and Johnson et al.13 and GenBank. In the observed network (
The wildlife surveillance data consisted of tests for 99,375 animals, representing specimens from 861 species, mostly bats, rodents, primates, and other mammals (https://zenodo.org/record/5899054)1. To predict associations (linkages) between novel viruses with other viruses formed due to common host species, gradient boosting models were trained using network topological characteristics and families of viruses in the virus pairs to estimate: (1) whether virus pairs have a species host in common; and (2) the taxonomical order of shared hosts (Fig. 1).
Characteristics of predicted network ( ) and newly discovered viruses
The binary model performed well in predicting the presence of links formed due to sharing of hosts between two virus nodes in the network (mean positive predictive value=0.99, sensitivity=0.96, F-score 0.97, Fig. S6). The distribution of predicted probability for all links using the binary model showed clear bimodal distribution (Fig. S7a). The accuracy scores as a function of precision and recall indicated good model performance beyond 0.15 predicted probability for the binary model (Fig. S8). Hence, as a more conservative approach and to give more weight to the precision, we decided to use 0.7 as an optimum threshold for detecting a positive link between two nodes (viruses). The performance of the multilabel model varied for taxonomical orders, with a high to moderate performance for predicting taxonomical group and order of ‘humans’ and Cetartiodactyla (Figs. S7, S9). For 531 novel viruses, we identified 184,055 possible links to other viruses formed due to sharing of hosts (based on the optimum probability threshold of 0.7 identified for the binary model) generating the predicted network (
Empirical biological networks are rarely scale-free (network with large hubs and showing a power-law distribution for degree)14 but a recently published study with host-host projected networks where links are represented by sharing of pathogens between hosts, has shown scale-free nature where models with power-law distributions showed the best fit for host-parasite networks15. Similarly, both observed (
Based on a linear regression model with node-level permutations (10,000 permutations), our adjustment for search effort (PubMed hits) was found to have no effect on the degree (p=0.39, Fig. S12) and betweenness centrality (p=0.22, Fig. S13), but did significantly affect the eigenvector (p<0.05, Fig. S14) and clustering coefficient (p<0.05, Fig. S15) of novel viruses. These results indicate that sampling and reporting efforts affect our understanding of the predilection towards certain species as illustrated by clustering in the network, but do not affect the prediction of missing host links quantified by degree centrality within the network. Many of the newly discovered viruses were mostly detected in only one species (mean=1.32, SD±0.99, n=944). Long tails of centrality distributions generated for the predicted network (
Importantly, a comparison between virus families of novel viruses showed that novel coronaviruses had a higher degree (p<0.001, Fig. 2c, Fig. S12), betweenness (p=0.02, Fig. 2d, Fig. S13), and eigenvector (p<0.001, Fig. S14) centralities in the predicted network compared to newly discovered viruses in all other virus families (Fig. 2c, d, g). In addition, the raw detection data showed significantly higher host diversity for novel coronaviruses with a mean of 2.02 (SD±2.03, n=114) unique host species (maximum of 15 species) compared to 1.22 (SD±0.70, n=834) for other novel viruses detected in this study. This finding raises concern about the ability of novel coronaviruses to infect a greater number of species than viruses from other families. The recently emerged SARS-CoV-2 and the previously emerged SARS-CoV-1, have shown a wide host breadth16. These predictions for novel coronaviruses highlight their key ecological properties that can influence spillover into humans. Following coronaviruses, novel flaviviruses showed significantly higher betweenness centrality (p<0.001). Host taxonomic order for novel viruses had no significant association with the degree centrality of the virus in the predicted network. Predicted network characteristics not only differentiate virus families based on network characteristics but also predict network characteristics that are key in understanding the ecology of a novel virus and its behavior within the network community of hosts, including the expected breadth of host species most likely to be infected by that novel virus.
Prioritizing novel viruses for further characterization
For the 531 novel viruses, we developed prioritization metrics that inform on the ecological and evolutionary tendencies for spillover based on number of human links with known viruses predicted by the multiclass model. Novel viruses from Herpesviridae, Rhabdoviridae, Coronaviridae, Adenoviridae, Astroviridae, and Paramyxoviridae families not only showed a high median probability of sharing human links with known viruses (Fig. S16) but also were predicted to have large numbers of human links in the predicted network (
For a relative comparison of zoonotic risk for novel viruses, a prioritization score was developed based on the predicted probability of links being human and the number of shared human links in the predicted network for a given virus. To understand the performance of the prioritization score, we compared scores for known zoonotic and non-zoonotic viruses generated by the ensemble of both binary and multi-class models. Results indicated significantly higher prioritization scores for known zoonotic viruses (Fig. S17, p<0.001) compared to known non-zoonotic viruses. Prioritization scores were derived essentially from the prediction of new/yet unobserved network links generated by the virus with another virus formed due to sharing of hosts. However, models were unable to predict new links for well-recognized viruses that have numerous hosts, such as Rabies virus and West Nile virus, and consequently resulted in a prioritization score of zero. Figure 3a–d shows the top ten and bottom five novel viruses from four virus families for relative comparison based on the prioritization score (Figs. S18–S24). PREDICT_CoV-15 found in two Phyllostomidae bats from South America (Artibeus lituratus, Sturnira lilium) scored the highest prioritization score in all novel viruses. Other top ten novel coronaviruses based on the prioritization score included viruses detected in Phyllostomidae bats (PREDICT_CoV-4, PREDICT_CoV-13, PREDICT_CoV-11, PREDICT_CoV-5). Out of these, PREDICT_CoV-11 was also detected in Mormoopidae species (Pteronotus personatus) and PREDICT_CoV-5 was found in Vespertilionidae species (Bauerus dubiaquercus) during the surveillance. These also included coronaviruses detected in Southeast Asian Pteropodidae bat species such as PREDICT_CoV-16 and PREDICT_CoV-22. PREDICT_CoV-22 was also detected in Hipposideridae bat species (Hipposideros lekaguli). PREDICT_CoV-78 detected in multiple bat and rodent species of Southeast Asia also showed a high prioritization score. These model outcomes, especially the prioritization score, provide a data-driven tool to quantify zoonotic risk for novel viruses. Even though the model is trained on numerous data points for known zoonotic and non-zoonotic viruses, individual predictions for newly discovered viruses would only require data on hosts and virus family if used within our modeling framework.
Prioritizing future surveillance
The sharing of viruses among hosts is driven by geographical overlap and synergies in ecological niches of hosts, as well as virus-specific characteristics that enable cross-species transmission10. Novel viruses discovered in rodents, bats, primates, and other mammalian hosts were sampled from sites in close association with people, or at high-risk interfaces that can facilitate disease transmission in urban and rural settings1,13. Additional surveillance across a broader taxonomic range is essential to gain insights on newly detected viruses, further inform spillover risk, and improve model predictions presented here. We used our network model and host taxonomic data in which the novel virus is first detected to prioritize host species (surveillance targets) for further surveillance of newly discovered viruses (Supplementary Data 1). Moreover, given the recent SARS-CoV-2 pandemic we further explored surveillance targets for novel coronaviruses. Novel coronaviruses were detected in bats, rodents, birds, and primates (Fig. 4a). For novel coronaviruses, that were detected in bats, predicted surveillance targets for bat coronaviruses showed three distinct clusters (Fig. 4b). The first cluster of novel coronaviruses in bats had a higher proportion of predicted species from the Miniopteridae family (Bent-winged bats) but none from Natalidae (Neotropical funnel-eared bats). Another prominent cluster prioritized all 11 chiropteran families, while the third cluster of coronaviruses showed relatively fewer host recommendations from Miniopteridae bats. Representation of these surveillance targets through these clusters highlights host predilection of novel coronaviruses and indicates their preferential sharing of hosts. These clusters also support earlier results related to the scale-free nature of the predicted network (
Grange et al. developed a tool that ranks viruses for an animal to human spillover using a risk-based approach validated by inputs from various experts from the field of virology, epidemiology, and ecology5. Our approach, on the other hand, quantifies the risk of spillover agnostically and informs the predicted host range solely based on existing data available across the breadth of viruses and natural infections observed in free-ranging mammalian and avian hosts. Although numerous studies have been recently published that predict host-pathogen predilections, our framework quantifies the risk for viruses that have been recently discovered in animal hosts. Network models have shown to perform well with the inclusion of ecological trait data10,17 and genome sequences18, but, with the limited data available for novel viruses, the approach provided here is an important step towards characterizing zoonotic potential for newly discovered animal viruses in the face of sparse data. These results may imply that network models are better at identifying a predictive signal when they are virus-centric (viruses as nodes and shared hosts as edges), particularly given previous host-centric work has produced mixed results when using trait-agnostic network modelling approaches17. Our network approach presents some limitations specifically for viruses that have been detected in species with limited surveillance efforts to date and are thus not part of the training data. For this reason, we were able to generate predictions for only 531 novel viruses out of 944. The remaining 413 novel viruses without predictions were detected in species that were never found positive for any virus, starkly indicating the lack of surveillance in wildlife. Further, model findings should be interpreted as associations between hosts and viruses based on the detection of viruses in samples collected from host species. These associations require further understanding around the role of hosts in the transmission ecology of viruses, especially to elaborate if hosts can serve as reservoir, amplifying, or dead-end hosts. Detection of a virus in a host species is not always correlated with that host’s ability to produce viremia for further transmission. Similarly, some of the novel viruses from Picobirnaviridae and Rhabdoviridae have been speculated to be hyperparasites and the interpretation of these detections and predicted host-associations need further investigations.
Conclusions
Novel viruses with high scores on the prioritization metrics present a strong eco-evolutionary case for further genetic and in-vivo characterization to understand the risk of spillover. The scoring will help streamline in-depth in-vivo characterization and develop additional hypotheses related to genetic and ecological mechanisms for cross-species transmission and zoonotic spillover. Nucleotide data associated with novel viruses presented here are short, hence the current model framework of using only host associations provides a key advantage. However, network models have shown to improve prediction capacities when nucleotide data are included as features for prediction11. These tools will improve with further surveillance and discovery of new viruses and their hosts19, ultimately informing our understanding of the mechanisms of zoonotic emergence for viruses from wildlife.
Methods
Data collection
Virus-host data was collated from various sources. Major sources for the association databases included data shared by Olival et al4., Pandit et al.3, and Johnson et al.13. In data provided by Olival et al (assessed September 2019), host-virus associations have been assigned a score, based on detection methods and tests that are specific and more reliable. We used associations that have been identified as the most reliable (stringent data) from Olival et al4. In addition, a query in GenBank was run to parse out hosts reported for each GenBank submission for viruses presented in each of these three databases. Initially, for each virus name, taxonomic ID was identified using entrez.esearch function in biopython package. The taxonomic ID helped linked to the GenBank databases, identify the ICTV lineage and associated data in PubMed20,21. NCBI TaxID closely follows the ICTV database, but some recent changes in ICTV might not always be reflected in NCBI, so we manually checked names to ensure matching. This included virus genus and family information along with a standard virus name. Host data were aggregated based on the taxonomic ID and associated standard name. Finally, for each virus, a search was completed in PubMed to compile the number of hits related to the virus and their vertebrate hosts using the search terms below. The number of PubMed hits (PMH1) were used as a proxy for sampling bias3,13. The virus-host association data source is presented in supplementary code and data files (https://zenodo.org/record/5899054).
Along with the PubMed terms we also queried the nucleotide database on PubMed using the taxonomic ID to find the number of GenBank entries for these viruses (PMH2). A correlation analysis between the PMH1 and PMH2 of well-recognized known viruses showed a high correlation with each other for us to safely use GenBank hits for novel viruses during the prediction stage of the model (Fig. S32).
Development of
a. Centrality measures of observed network (
To test if centrality measures (degree centrality, betweenness centrality, eigenvector centrality, clustering coefficient) for viral nodes in the observed network (
After fitting the model, node-level permutations were implemented. For each random permutation, the output variable was randomly assigned to covariate values and the model was re-fitted. Finally, a p-value was calculated by comparing the distribution of coefficients from permutations with the original model coefficient.
Network topology feature selection
Using the observed network (
1. The Jaccard coefficient: a commonly used similarity metric between nodes in information retrieval, is also called an intersection of over the union for two nodes in the network. In the unipartite network generated here, it represents the proportion of common neighbor viruses from the union of neighbor viruses for two nodes. Neighbor viruses are defined as viruses with which the virus shares at least a single host.
2. Adamic/Adar (Frequency-Weighted Common Neighbors): Is the sum of inverse logarithmic degree centrality of the neighbors shared by two nodes in the network24. The concept of Adamic Adar index is a weighted common neighbors for viruses in the network. Within network prediction, the index assumes that viruses with large neighborhoods have a less significant impact while predicting a connection between two viruses compared with smaller neighborhoods.
Both Jaccard and Adamic Adar coefficients have been routinely used for generalized network prediction and have shown high accuracy in predicting missing links in networks, specifically bipartite networks25, the information flowing through neighborhoods formed by two nodes might not always be enough to have similar predictive power in an unipartite network. This warrants use of other topology features along with neighborhood-based features.
3. Resource allocation: Similarity score of two nodes defined by the weights of common neighbors of two nodes. Resource allocation is another measure to quantify the closeness of two nodes in the network and hence to understand the similarity of hosts they infect.
4. Preferential attachment coefficients: The mechanism of preferential attachment can be used to generate evolving scale-free networks, where the probability that a new link is connected to node x is proportional to k26.
5. Betweenness centrality: For a node in the network betweenness centrality is the sum of the fraction of all-pairs shortest paths that pass through it. The feature that we used for training the supervised learning model was the absolute difference between of betweenness centralities of two nodes. The difference between the betweenness centrality represents the difference in the sharing observed by two viruses in the pair.
6. Degree centrality: The degree centrality for a node v is the fraction of nodes it is connected to. The feature that we used for training the supervised learning model was the absolute difference between degree centralities of two nodes. Unlike the difference in the betweenness centrality, the difference in degree centrality only looks at the difference in the number of observed host sharing.
7. Network clustering: All nodes were classified into community clusters using Louvain methods27. A binary feature variable was generated to describe if both the nodes in the pair were part of the same cluster or not. If both viruses are from the same cluster, it represents a similar host predilection than when both viruses are not from the same cluster hence accounting for the evolutionary predilection of viruses (or virus families) to infect a certain type of host.
These topological network characteristics come with certain limitations when it comes to the unipartite network of viruses with links formed due to shared hosts and might not truly represent the flow of information between nodes as compared to a bipartite network. Therefore, to account for these limitations, we use multiple network features as weak learners in our model building characteristics summarizing the network through the use of several quantitative metrics. In addition to this, we estimated the feature importance of these metrics in predicting missing links between viruses to quantify the information pasting through these links.
Pearson’s correlation coefficients were calculated to identify highly correlated features and for choosing features for model training (Fig. S33). Virological features included in model training were categorical variables describing the virus family of both the nodes in the pair, followed by a binary variable if both the viruses belong to the same virus family. During the model development, PubMed hits generated three predictive features for each pair of viruses on which model training and predictions were conducted. These included two features representing PubMed hits for the two viruses in the pair (PubMedV1, PubMedV2) and the absolute difference between PubMedV1 and PubMedV2 to account for differences in sampling bias between the two viruses.
Cross-validation and fitting generalized boosting machine (GBMs) models
A nested-cross-validation was implemented for the binary model while simple cross-validation was implemented for the multiclass model (multiple output categories). The parameters of the binary model were first hyper-tuned using a cross-validated grid-search method. Values were tested using a grid search to find the best-performing model parameters that showed the highest sensitivity (recall). The parameters tested for hypertuning and their performance are provided in the supplementary material (supplementary results and Table S5). For further cross-validation of the overall binary model, all the viruses were randomly assigned to five groups. For each fold, the viruses assigned to a group were dropped from the data, and a temporary training network (
The multiclass model was implemented in the same way, creating an observed network (
We used three methods to estimate the importance of features for our binary model. Specifically, improvement in accuracy brought by branching based on the feature (gain), the percentage of times the feature appears in the XGboost tree model (weight), and the relative number of observations related to the specific feature (cover). Results for feature importance are shown in supplementary results (Fig. S10).
Missing links for novel viruses, binary and multiclass prediction
The wildlife surveillance data represented a sampling of 99,379 animals (94,723 wildlife, 4656 domesticated animals) conducted in 34 countries around the world between 2009–2019 (Table S6)1. Specimens were tested using conventional Rt-PCR, Quantitative PCR, Sanger sequencing, and Next Generation Sequencing protocols to detect viruses from 28 virus families or taxonomic groups (Table S7). Testing resulted in 951 novel monophyletic clusters of virus sequences (referred to as novel viruses henceforth). Within 951 novel viruses, 944 novel viruses had vertebrate hosts that were identified with certainty based on barcoding methods and field identification. Host species identification was confirmed by cytochrome b (cytb) DNA barcoding using DNA extracted from the samples28. We predicted the shared host links between novel viruses and known viruses using binary and multiclass models in the following steps. Out of 944 novel viruses discovered in the last ten years, we were able to generate predictions for 531 novel viruses that were detected in species already classified as hosts within the network. The remaining 413 viruses were the first detection of any virus in that species and thus host associations could not be informed by the observed network (
1. A new node representing the novel virus was inserted in the observed network (
2. Using
The results indicated that Genbank hits had statistically significant predictive value in predicting PubMed hits (β= 0.72, p<0.005) even after accounting for various virus families. Multiple virus families showed statistically different estimates than the reference virus family (Adenoviridae) indicating a significantly different association than other virus families. Results of the generalized linear regression model are presented in Table S8.
3. Using this dataset for the novel virus, a binary presence of a link between the novel virus and known viruses was predicted using the trained binary model. The taxonomic order of the host link was predicted using the trained multiclass model.
4. For each possible link, the binary model predicted the probability of sharing a link, and the multiclass model predicted multivariate outcomes of taxonomic orders and associated probabilities. A threshold of 0.70 for the binary prediction model was used to classify if the link is present or not and only those links were explored for their corresponding multiclass model outputs.
5. The multiclass model showed higher performance for correctly classifying links as “human” hosts than other numerous avian and mammalian taxonomic orders. Hence, the multiclass model outputs were summarized into either humans or other taxonomic groups. For the novel virus, a list of known viruses with the predicted link was generated. Using the hosts of these known viruses and the taxonomic order in which the novel virus was detected, a list of most likely species was generated based on the overall frequency of the host species. For understanding the likelihood of infecting humans two factors were considered to be of importance. Firstly, the number of links where humans are predicted as shared hosts with known viruses (
To test if virus family, the taxonomic order of hosts in which novel viruses were detected, and the number of times the viruses were detected (equivalent to PubMed hits for known viruses) influenced node (virus) level network centrality measures in the predicted network (
For each of the random 10,000 node-level permutations, the output variable (centrality measure) was randomly assigned to covariate values and the model was re-fitted. A p-value was calculated by comparing the distributions of coefficients with the original model coefficient. These models were fitted for degree centrality, betweenness centrality, eigenvector centrality, and clustering coefficient of novel viruses in the predicted network.
Prioritization score for novel viruses
Generalized Linear Mixed Models were used to understand the association effects of virus family, taxonomic order of the host and PubMed hits on the number of predicted human links and mean probability of the predicted links. The models were fit using glmmTMB and glm packages in R. For relative comparison of zoonotic risk and for prioritizing novel viruses for further characterization, a prioritization metric was developed based on the predicted probability of sharing the humans as hosts with known viruses (
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Author contributions
P.S.P., C.K.J., S.J.A, T.G, K.L.O, and J.A.K.M conceived of the research; P.S.P analyzed the data; P.S.P., S.J.A., T.G., K.J.O., M.M.D., N.R.G., B.B., W.A.S., D.W., K.G., C.M., T.K., M.U., J.H.E., C.M., M.K.R., P.D., E.H., A.S., H.L., A.A.C., A.L., C.L., T.O’R., S.H.O., L.K., A.P.M., A.P., C.D. de P., D.Z., M.V., M.LeB., D.M., A.I., V.D., M.M., Z.S., P.M., C.K., M.A., N.K., U.T., S.B. N., A.C., J.P., K.C., E.A. B., J.K., S.S., J.D., T.H., E.S., O.A., D.K., J.N., D.N., A.G., Z.S., S.W., E.A. R., B.S., G.S., L.F.A., M.R.S., T.N.D., N.T. T.N., P.L.H., D.O.J., K.S., A.F., S.M., W.K., P.D., J.A.K.M., PREDICT Consortium, & C.K.J. collected data, wrote and revised the manuscript.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Pei Hao and Luke R. Grinham.
Funding
This work was supported by the United States Agency for International Development (USAID) Emerging Pandemic Threat PREDICT program (Cooperative Agreement nos. GHN-A-00-09-00010-00 and AID-OAA-A-14-00102). P.S.P., C.K.J, M.U., K.G., and N.R.G. are also supported by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number U01AI151814. The content is solely the responsibility of the authors and does not necessarily represent the official views of the USAID, National Institutes of Health, or the United States Government. We thank the governments of Bangladesh, Bolivia, Brazil, Cambodia, Cameroon, China, DR Congo, Egypt, Ethiopia, Gabon, Ghana, Guinea, India, Indonesia, Ivory Coast, Jordan, Kenya, Lao PDR, Liberia, Malaysia, Mexico, Mongolia, Myanmar, Nepal, Peru, Republic of Congo, Rwanda, Senegal, Sierra Leone, Tanzania, Thailand, Uganda, and Vietnam for permission to conduct this study, and the field teams and collaborating laboratories that performed sample collection and testing.
Data availability
Data reported in this paper are available at https://zenodo.org/record/5899054, https://data.usaid.gov/d/tqea-hwmr and https://data.usaid.gov/d/x3ij-fnrb, https://data.usaid.gov/Global-Health-Security-in-Development-GHSD-/PREDICT-Emerging-Pandemic-Threats-Project/tqea-hwmr.
Code availability
Code used to develop models and generate results and figures presented in the paper is available at https://zenodo.org/record/5899054.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: S. J. Anthony, T. Goldstein.
A list of authors and their affiliations appears at the end of the paper.
A full list of members and their affiliations appears in the Supplementary Information.
Change history
1/10/2023
A Correction to this paper has been published: 10.1038/s42003-022-04364-y
Contributor Information
Pranav S. Pandit, Email: ude.sivadcu@tidnapsp.
Christine K. Johnson, Email: ude.sivadcu@nosnhojkc.
PREDICT Consortium:
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-022-03797-9.
References
Articles from Communications Biology are provided here courtesy of Nature Publishing Group
Full text links
Read article at publisher's site: https://doi.org/10.1038/s42003-022-03797-9
Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/s42003-022-03797-9.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Discover the attention surrounding your research
https://www.altmetric.com/details/134760020
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1038/s42003-022-03797-9
Article citations
Modeling zoonotic and vector-borne viruses.
Curr Opin Virol, 67:101428, 22 Jul 2024
Cited by: 0 articles | PMID: 39047313
Review
Spatial examination of social and environmental drivers of Middle East respiratory syndrome coronavirus (MERS-CoV) across Kenya.
Ecohealth, 25 Jun 2024
Cited by: 0 articles | PMID: 38916836
Structure determination needs to go viral.
Amino Acids, 56(1):3, 29 Jan 2024
Cited by: 0 articles | PMID: 38286913 | PMCID: PMC10824879
RNAVirHost: a machine learning-based method for predicting hosts of RNA viruses through viral genomes.
Gigascience, 13:giae059, 01 Jan 2024
Cited by: 0 articles | PMID: 39172545 | PMCID: PMC11340644
Computational Drug Design Strategies for Fighting the COVID-19 Pandemic.
Adv Exp Med Biol, 1457:199-214, 01 Jan 2024
Cited by: 0 articles | PMID: 39283428
Go to all (7) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Host and viral traits predict zoonotic spillover from mammals.
Nature, 546(7660):646-650, 21 Jun 2017
Cited by: 482 articles | PMID: 28636590 | PMCID: PMC5570460
Spillover and pandemic properties of zoonotic viruses with high host plasticity.
Sci Rep, 5:14830, 07 Oct 2015
Cited by: 145 articles | PMID: 26445169 | PMCID: PMC4595845
Zoonotic viruses of wildlife: hither from yon.
Arch Virol Suppl, (18):1-11, 01 Jan 2004
Cited by: 12 articles | PMID: 15119758
Review
Identifying and prioritizing potential human-infecting viruses from their genome sequences.
PLoS Biol, 19(9):e3001390, 28 Sep 2021
Cited by: 38 articles | PMID: 34582436 | PMCID: PMC8478193
Funding
Funders who supported this work.
Bureau for Economic Growth, Education, and Environment, United States Agency for International Development (1)
Grant ID: AID-OAA-A-14-00102
Division of Intramural Research, National Institute of Allergy and Infectious Diseases (1)
Grant ID: U01AI151814
Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Division of Intramural Research of the NIAID) (1)
Grant ID: U01AI151814
NIAID NIH HHS (1)
Grant ID: U01 AI151814
United States Agency for International Development (1)
Grant ID: GHN-A-00-09-00010-00