Abstract
Free full text
Evidence of the Recombinant Origin of a Bat Severe Acute Respiratory Syndrome (SARS)-Like Coronavirus and Its Implications on the Direct Ancestor of SARS Coronavirus
Abstract
Bats have been identified as the natural reservoir of severe acute respiratory syndrome (SARS)-like and SARS coronaviruses (SLCoV and SCoV). However, previous studies suggested that none of the currently sampled bat SLCoVs is the descendant of the direct ancestor of SCoV, based on their relatively distant phylogenetic relationship. In this study, evidence of the recombinant origin of the genome of a bat SLCoV is demonstrated. We identified a potential recombination breakpoint immediately after the consensus intergenic sequence between open reading frame 1 and the S coding region, suggesting the replication intermediates may participate in the recombination event, as previously speculated for other CoVs. Phylogenetic analysis of its parental regions suggests the presence of an uncharacterized SLCoV lineage that is phylogenetically closer to SCoVs than any of the currently sampled bat SLCoVs. Using various Bayesian molecular-clock models, interspecies transfer of this SLCoV lineage from bats to the amplifying host (e.g., civets) was estimated to have happened a median of 4.08 years before the SARS outbreak. Based on this relatively short window period, we speculate that this uncharacterized SLCoV lineage may contain the direct ancestor of SCoV. This study sheds light on the possible host bat species of the direct ancestor of SCoV, providing valuable information on the scope and focus of surveillance for the origin of SCoV.
Severe acute respiratory syndrome (SARS) is a contagious respiratory disease caused by a newly emerged coronavirus (CoV) named SARS-CoV (SCoV) (10). SCoV is phylogenetically distinct from other CoVs in animals and humans (45). SCoV was also isolated from small mammals, such as civets (Paguma larvata) and raccoon dogs (Nyctereutes procyonoides), in live-animal markets of southern China, suggesting that these mammals may have been the direct sources of the SARS epidemic in early 2003 (11). However, further studies demonstrated the lack of widespread infections in wild or farmed civets, implying that civets might act as only amplifying hosts and not a natural reservoir of SCoV (20). Recently, a group of CoVs that are closely related to SCoVs were identified in various species of horseshoe bats (Rhinolophus spp.) (29, 31). Their genomes share the same organization and an overall 88% to 92% sequence identity with that of the human and civet SCoVs (collectively designated Hu-SCoV), and thus, they are termed bat SARS-like CoVs (Bt-SLCoVs).
Genetic analysis revealed a considerable diversity among Bt-SLCoV genomes, suggesting the presence of a wide spectrum of genetically diverse Bt-SLCoVs in various bat species (43). In addition, previous studies indicated a high seroprevalence against Bt-SLCoVs among various bat populations (29, 31). Therefore, bats were proposed to be the natural reservoir of the lineage of SLCoV and SCoV. Nonetheless, based on the relatively distant phylogenetic relationship between Hu-SCoVs and Bt-SLCoVs, researchers suggested that none of the currently sampled Bt-SLCoVs is the descendant of the direct ancestor of Hu-SCoVs (51). Therefore, the direct ancestor of Hu-SCoVs, as well as its corresponding host species, remains elusive.
In this study, we reanalyzed the available Bt-SLCoV genomes and identified a possible recombination event within the genome of a Bt-SLCoV. Phylogenetic analysis of its parental regions suggests the presence of an uncharacterized SLCoV lineage that is phylogenetically closer to Hu-SCoVs than any of the currently sampled Bt-SLCoVs and is therefore a candidate for the direct ancestor of the Hu-SCoV lineage.
To investigate the time of divergence between Hu-SCoVs and this SLCoV lineage, we analyzed the SCoV and SLCoV genome data under both strict- and relaxed-molecular-clock models. Previous studies demonstrated that the rate variations among lineages can mislead estimation of the divergence date if a strict clock is assumed (54). In contrast, if the data set is clocklike, assumption of a molecular clock increases the precision of rate estimates without compromising accuracy (14). The choice of a molecular-clock model is thus crucial for accurate molecular dating. Therefore, we analyzed our data sets under various Bayesian molecular-clock models, aiming to place a robust time scale on the interspecies transmission of Bt-SLCoVs and to provide insights into the zoonotic origin of Hu-SCoVs.
MATERIALS AND METHODS
Detection of recombination.
Complete genome sequences of Hu-SCoV (n = 10) and Bt-SLCoV (n = 7) were downloaded from GenBank, and these nucleotide sequences were aligned using ClustalX with all gap columns removed. The data set was preliminarily scanned for recombination events by Recombination Detection Program (RDP) 2.0 (35), using MaxChi and Chimaera algorithms with a 0.6 and 0.05 fraction of variable sites per window, respectively. To further investigate the potential recombination event suggested by RDP, similarity plot and bootscan analyses, implemented in Simplot 3.5.1 (33), were performed on the complete genome alignment of selected strains, including Bt-SLCoV strain Rp3 (DQ071615) as the query; Hu-SCoV strains Tor2 (AY274119), SZ3 (AY304486), GD01 (AY278489), ZJ01 (AY297028), GZ04 (AY613947), and PC4 (AY613950) as potential major parents; Bt-SLCoV strain Rm1 (DQ412043) as a potential minor parent; and strain Rf1 (DQ412042) as an outgroup.
Estimation of the potential recombination breakpoint location.
The data set was further analyzed using single-breakpoint estimation algorithms implemented in Genetic Algorithms for Recombination Detection (GARD) and Likelihood Analysis of Recombination in DNA (LARD). Based on the bootscan analysis, only the 2,000 nucleotides (nt) around the open reading frame 1b (ORF1b)/S junction (nt 20150 to 22202; all nucleotide numberings in this study are based on AY274119) were analyzed in order to increase the precision of recombination breakpoint estimation. Based on the RDP results, three selected taxa, Rp3, Tor2, and Rm1, were used in the analyses described below. Briefly, GARD uses a genetic algorithm to search for the best breakpoint locations (23). LARD uses a maximum likelihood (ML) method and a likelihood ratio test (LRT) to access the significance of the inferred breakpoint (15). To demonstrate that the detected recombination event is not likely to be a result of random chance (15), the likelihood ratio (LR) of our data set was evaluated against the null distributions of LRs of 1,000 simulated data sets, assuming no recombination, using Seq-Gen (42).
Investigation of the phylogenetic origin of the potential parents.
The genome regions 5′ upstream and 3′ downstream of the estimated breakpoint were designated major and minor parental regions, respectively. To investigate the phylogenetic origins of these potential parents, coding sequences of essential ORFs of the major (i.e., ORF1) and minor (i.e., S, E, M, and N genes) parental regions of selected CoV strains (n = 13) were aligned independently using ClustalX based on their codon sequences. The aligned ORFs of the two parental regions were degapped and concatenated separately, generating two alignments of 20,085 bp and 5,778 bp for the major and minor parental regions, respectively. For each of the parental regions, phylogenies were constructed using the Bayesian Markov chain Monte Carlo (BMCMC) method. The BMCMC analyses summarized the majority consensus trees produced by two sets of four tempered MCMC chains of 107 states sampled every 1,000th generation, with the initial 10% of states discarded. The Bayesian phylogenetic analysis was performed with MRBAYES 3 (44) under the best-fit substitution model determined by MRMODELTEST 2 (http://people.scs.fsu.edu/~nylander/). According to the BMCMC phylogeny (see Fig. Fig.2A),2A), the major parental lineage of Rp3 is designated the human-bat SLCoV (HB-SLCoV) lineage based on its close phylogenetic relationship with the Hu-SCoV lineage.
Estimation of the time of the divergence events.
To estimate the time of the most recent common ancestor (tMRCA) of Hu-SCoVs, as well as the time of divergence events (tDIV) between the Hu-SCoV and HB-SLCoV lineage (designated tMRCA-Hu and tDiv-Hu/HB, respectively) (see Fig. Fig.2),2), coding sequences of S1 (nt 21492 to 22784; n = 36) and ORF1 (nt 898 to 21479; n = 24) were analyzed under various molecular-clock models in both ML and Bayesian frameworks (details of the taxa in the two data sets are listed as supplementary material at http://evolution.hku.hk/SARS_dating.htm). The sampling times (i.e., the month and year) of the taxa were collected from the literature and used as calibration points in the clock models (41).
First, the strict molecular clock (i.e., a constant rate of evolution) of the two data sets was evaluated in an ML framework using PAML 3.15 as previously described (16, 41, 53). Briefly, the performances of the single-rate dated-tip (SRDT) (i.e., strict-clock) and the different-rate (DR) (i.e., no-clock) models in the data sets were compared using an LRT. Second, the two data sets were analyzed under the strict-clock model (CLOC), as well as the uncorrelated exponentially and lognormally distributed relaxed-clock models (UCED and UCLN) in a Bayesian framework. The CLOC model assumed a constant rate of evolution throughout the tree. The UCED and UCLN models assumed independent rates on different branches, which were drawn from an underlying exponential and lognormal distribution, respectively (6). These clock models are implemented in BEAST 1.4 (8). The MCMC chains were run for 5 × 106 (S1 data set) or 1 × 108 (ORF1 data set) states sampled every 1,000 generations with the initial 10% of burn-in samples discarded (7). For both data sets, the best-fit substitution model was the general time-reversible (GTR) model allowing four categories of gamma-distributed rate heterogeneity distribution and a proportion of invariant sites (GTR + Γ4 + I), as determined by MODELTEST. Since the past population dynamics of the data sets were not the primary interest of our study, we assumed a constant coalescent tree prior for all analyses, with a Jeffreys prior on the constant population size hyperparameter (7). To investigate if this tree prior biased our date estimation, we also analyzed our data sets using a Yule tree prior, which assumes a constant speciation rate per lineage (6). All MCMC chains were independently run twice for the same analysis.
To use information from the S1 data set to improve our estimate of tDIV-Hu/HB from the ORF1 data set, an S1-derived prior distribution was specified on tMRCA-Hu, which is a divergence event shared by the phylogenies of both data sets. This prior distribution was based on the posterior distribution of tMRCA-Hu estimated from the S1 data set under the best-fit clock model. The mode and parameters of this distribution were estimated using distribution-fitting software, EasyFit 3.2 (MathWave Technologies). The MCMC chains for the ORF1 data set were rerun under the same configurations described above, except an S1-derived prior was specified on tMRCA-Hu. For all Bayesian analyses, median and the highest posterior density regions at 95% (HPD) of the parameters were summarized from two identical but independent MCMC chains using TRACER 1.3 (http://beast.bio.ed.ac.uk/). The adequacy of sampling was assessed via effective sample size, which was larger than 200 for all summary statistics investigated (all xml files for BEAST are available as supplementary material at http://evolution.hku.hk/SARS_dating.htm).
Comparison of the performance characteristics of Bayesian clock models.
To compare the performance of any two Bayesian clock models for the same data set, the Bayes factor (BF) was calculated. The BF is the ratio of the marginal likelihoods of the two models. A simple method described by Newton and coworkers (39) computes the BF via importance sampling. A BF of >20, or a ln BF of >2.99, is defined as strong support for the favored model. Clock models of the same data set were compared two by two, and estimates of the best-fit model were taken as the final results.
RESULTS AND DISCUSSION
Detection of recombination and estimation of breakpoint location.
The RDP analysis suggested that Bt-SLCoV Rp3 may be a recombinant of a Bt-SLCoV strain and a strain that is closely related to Hu-SCoVs (data not shown). The similarity plot indicated that the 5′ genomic region of Rp3 shares a substantially higher similarity with the Hu-SCoVs, while its 3′ genomic region is more similar to that of the Bt-SLCoVs (Fig. (Fig.1A).1A). Moreover, the bootscan analysis suggested discordance of phylogenetic signals between different genomic regions (Fig. (Fig.1B).1B). Taken together, these analyses suggested a single recombination breakpoint located around the junction between the S and ORF1b coding regions (Fig. (Fig.1C1C).
To accurately locate the potential recombination breakpoint and to determine the level of its statistical significance, GARD and LARD analyses were performed. Both analyses estimated a potential breakpoint at nt 21495, which is the nucleotide immediately after the start codon of the S coding region (Fig. (Fig.1C).1C). The model average support of the breakpoint estimated in GARD analysis was >0.9. In the LARD analysis, the P value of the LRT was <0.0001. Moreover, the LR for this putative breakpoint was greater than any of the LRs of the corresponding simulated data sets (data not shown). These results suggest that the discordance of phylogenetic signals within the genome of Rp3 is not a result of chance and that the recombination breakpoint estimated from both analyses is statistically significant. It should be noted that Rp3 was not plaque isolated, and its genome was obtained by direct sequencing of the PCR products amplified from the field samples (31). Therefore, if the host was infected by multiple strains, we cannot exclude the possibility that the Rp3 genome represents a mosaic sequence of a number of strains. Nonetheless, only one recombination breakpoint was identified within the 29-kb genome, and its parental regions are relatively long (about 21 and 8 kb). Given that the genome was assembled from the sequences of a number of overlapping PCR products, we believe that the probability that the detected recombination breakpoint is an artifact should be negligible.
Genomes of CoVs are reported to have relatively high recombination rates (28). For example, experimental recombination of temperature-sensitive mutants and the wild type of mouse hepatitis virus strains have been studied extensively (21, 27, 34). Moreover, evidence of recombination has also been reported in field isolates of infectious bronchitis virus (19, 25, 30) and feline CoV (13). The occurrence of a high frequency of homologous RNA recombination in CoV genomes is probably related to the unique discontinuous transcription mechanism of its mRNA, in which the nascent RNA transcripts must dissociate from the template and fuse with the leader RNA to a distant mRNA start site (28). Regular dissociation and rejoining of the complex of polymerase and nascent RNA during transcription are similar to the template-switching mechanism in “copy choice” model of recombination in RNA viruses (26). In fact, one of the most utilized recombination sites within the mouse hepatitis virus genome is at the junction between the leader RNA and the remainder of its genome (22). In addition, a previous report suggested that the consensus intergenic sequences (IGS) and the highly conserved sequences around this region may serve as recombination “hot spots” in infectious bronchitis virus (25). In this study, we identifed a potential recombination site immediately after the consensus IGS (17), suggesting that the replication intermediates may participate in the recombination event, as speculated previously in other CoVs. Previous studies suggested that the relatively high rates of recombination and mutation may facilitate the cross-species transmission of CoVs (2, 3), and therefore, CoVs were speculated to be potentially important emerging pathogens (1). A wider surveillance of Bt-SLCoVs may shed light on the possible roles of this observed recombination event in the emergence of SARS.
Phylogenetic origin of the putative parental strains.
To investigate the phylogenetic origin of the putative parents, two BMCMC phylogenies were constructed based on the major (Fig. (Fig.2A)2A) and minor (Fig. (Fig.2B)2B) parental regions, respectively. The minor parental region of Rp3 was clustered within the Bt-SLCoV lineage and shared monophyly with Rm1 and BtCoV/279/2005 (Fig. (Fig.2B).2B). This suggests that the potential minor parent of Rp3 is probably a Bt-SLCoV that shared a close phylogenetic relationship with Rm1 and BtCoV/279/2005. It has been suggested that there is species-specific host restriction of CoVs in bats, since most CoVs from a single bat species grouped together in phylogenetic analyses (48). Moreover, the S protein (which is located within the minor parental region) is the primary determinant of species specificity in CoVs (12, 36), and thus, we speculate that this minor parent may be a Bt-SLCoV residing in Rhinolophus pearsoni, i.e., the host species of Rp3.
On the other hand, the major parental region of Rp3 grouped with, but clustered outside of, the Hu-SCoV lineage (Fig. (Fig.2A).2A). Based on this observation, the potential major parent of Rp3 is possibly derived from an uncharacterized lineage that is phylogenetically closely related to Hu-SCoVs. The host species of this speculative parental lineage cannot be ascertained, as it was clustered within neither the Hu-SCoV nor the Bt-SLCoV lineage. Here, we outline three possibilities regarding the host species of this lineage. First, the lineage may originate from an unsampled group of phylogenetically distinct SCoVs residing in live-animal market mammals, like civets or racoon dogs. However, extensive surveillances of various mammalian species over a wide range of geographic locations have been performed, and only CoVs that are highly similar to SCoVs in humans were sampled (20). Thus, this possibility seems unlikely. Second, the lineage may originate from an unknown nonbat intermediate host species, which possibly acquired a SLCoV from bats and transmitted the virus to an amplifying host, such as civets, resulting in spillover in live-animal markets in southern China. However, one of the prerequisites for recombination is coinfection of parental strains within an individual. Therefore, recombination of parental strains residing in different species, i.e., bats and the unknown intermediate host in this case, may be rare due to the relatively strict tropism barrier of CoVs (12, 52). Third, the strain may originate from an unsampled SLCoV lineage residing in a bat species that is phylogenetically closer to Hu-SCoVs than all other currently sampled Bt-SLCoVs. Based on the relatively high genetic diversity among the currently sampled Bt-SLCoVs, the existence of an unsampled phylogenetically distinct lineage of Bt-SLCoV is highly likely, and therefore, the third hypothesis seems to be the most plausible. In the discussions below, this parental lineage is therefore referred to as the HB-SLCoV lineage, while the term “Bt-SLCoV lineage” refers to all other sampled Bt-SLCoVs (Fig. (Fig.2).2). This lineage is proposed to contain the major parent of Rp3 and other closely related strains, and we cannot exclude the possibility that the lineage may also contain the direct ancestor of Hu-SCoVs. To further investigate the time of this interspecies transmission event, tMRCA-Hu and tDIV-Hu/HB (Fig. (Fig.2)2) were estimated under various molecular-clock models in both ML and Bayesian frameworks.
Molecular clock-like behavior of the data sets and choice of Bayesian clock models.
For the ORF1 data set, under the ML framework, LRT analysis suggests that the SRDT model should be rejected in favor of the DR model (Table (Table1).1). Moreover, BF analysis suggests that the UCED model fits the ORF1 data set significantly better than the other two models (Table (Table2),2), implying that the rate variations among branches of the ORF1 phylogeny are significant and that a strict clock cannot be assumed. The Bt-SLCoV lineage may contribute to the rate variations in the ORF1 data set, since CoVs of different hosts (i.e., bats and humans or civets) may have different substitution rates.
TABLE 1.
Data set | Viral lineages includeda | No. of taxab | dfc | 2Δd | LRT (P)e |
---|---|---|---|---|---|
S1 | Hu-SCoVs only | 36 | 34 | 39.29 | 0.24 |
ORF1 | All Hu-SCoVs and Bt-SLCoVs | 24 | 22 | 87.55 | <0.001 |
TABLE 2.
Parameter | Value
| ||
---|---|---|---|
Clock modelc | S1 data set | ORF1 data set | |
Marginal likelihooda | CLOC | −3,159.31 | −52,478.47 |
UCED | −3,155.66 | −52,452.64 | |
UCLN | −3,157.55 | −52,467.24 | |
BFb | UCED vs. CLOC | 3.65 | 25.82 |
UCLN vs. CLOC | 1.76 | 11.23 | |
UCED vs. UCLN | 1.89 | 14.59 |
For the S1 data set, LRT analysis suggests the SRDT model cannot be rejected (Table (Table1).1). Moreover, the performance of the CLOC model is not significantly worse than that of the UCLN model, implying that the rate variations may not be significant among branches in the S1 phylogeny (Table (Table2).2). However, the BF analysis also suggests the UCED model performed slightly better than the CLOC model. Nonetheless, the tMRCAs estimated under the relaxed- and strict-clock models are generally consistent (Fig. (Fig.3A),3A), suggesting that these rate variations did not have a significant impact on our estimates of tMRCA-Hu. Based on the marginal likelihoods and the BF analysis (Table (Table2),2), the estimates under the UCED model were taken as the final dating results of both data sets.
tMRCA-Hu.
Based on the analysis of the S1 data set under the UCED model, tMRCA-Hu was estimated to be at a median of 2002.74 (HPD, 2002.18 to 2003.04). This time point refers to the emergence of the common ancestor of all Hu-SCoVs. Under modest assumptions, i.e., that the root had been sampled and the emergence was the result of a single cross-species infection of a single viral lineage, this time point can be considered an estimate of the theoretical onset of the 2003 SARS outbreak. Our estimation of tMRCA-Hu is consistent with previous estimations (5, 47, 55).
tDIV-Hu/HB.
A prior was specified on tMRCA-Hu as a lognormal distribution with parameters chosen to fit the posterior distribution estimated from the S1 data set (Fig. (Fig.3B).3B). Bayesian inference specifically provides for the incorporation of prior knowledge, and in this way, we were able to combine information from both data sets in the estimation of tDIV-Hu/HB. Under the UCED model, the medians of tMRCA-Hu estimated from the ORF1 data set with or without the S1-derived tMRCA-Hu prior were similar, and the posterior distribution of tMRCA-Hu was not solely dependent on its prior distribution (Fig. (Fig.4A),4A), suggesting that the ORF1 data set was providing additional information in the Bayesian inference. Moreover, tDIV-Hu/HB was consistently estimated at a median around the late 1990s with or without the S1-derived tMRCA-Hu prior (Fig. (Fig.4B).4B). It was noted that the specification of S1-derived tMRCA-Hu priors substantially narrowed the HDP of the tDIV-Hu/HB estimate by about 40%, i.e., it decreased from 12.8 to 7.7 years (Table (Table3).3). Similar results were observed under the UCLN model (the data are not shown for simplicity).
TABLE 3.
tMRCA-Hu prior | tMRCA-Hua | tDIV-Hu/HBa | Branch Ab |
---|---|---|---|
With S1-derived prior | 2002.63 (2002.14-2002.96) | 1998.51 (1993.55-2001.32) | 4.08 (1.45-8.84) |
Without S1-derived prior | 2002.40 (2000.69-2003.01) | 1997.44 (1987.68-2001.49) | 4.86 (1.37-13.47) |
The relatively low substitution rate of ORF1 and the rate variation among branches in the ORF1 phylogeny may limit the power of the molecular-clock analysis on the ORF1 data set. These factors add uncertainty to our analysis and may widen the credible intervals of our estimates. In contrast, the S1 coding region has been identified as the most variable region among SCoVs (40) and the S1 data set was found to be more clocklike than the ORF1 data set. Therefore, we specified an informative prior on tMRCA-Hu based on the S1 data set, allowing us to combine information across the data sets and to reduce the uncertainty of our divergence time estimates.
Assuming there was an interspecies transmission of HB-SLCoVs from bats to an amplifying host (e.g., civets), the upper and lower bounds of this event should be theoretically represented by tDIV-Hu/HB and tMRCA-Hu, respectively (Fig. (Fig.5).5). Therefore, the time period between these two events can be considered the most conservative estimation of the period between the cross-species event and the onset of the epidemic. The median and HPD of this period were summarized by sampling the length of a particular branch (i.e., branch A in Fig. Fig.5)5) of all time-scaled MCMC phylogenies under the UCED model. This period was estimated at a median of 4.08 years (HPD, 1.45 to 8.84 years) (Table (Table3).3). The estimated mean substitution rate of the ORF1 data set under the UCED model was 2.79 × 10−3 (HPD, 1.64 × 10−3 to 4.35 × 10−3) substitution per site per year. This estimate is comparable to a previous estimation for the whole genome of Hu-SCoV (i.e., 0.80 × 10−3 to 2.38 × 10−3) (55) and is at the same order of magnitude as in other RNA viruses (4, 9, 18, 37, 38, 50). In addition, the ORF1 data set was reanalyzed under the UCED model with a Yule tree prior assumption, and the estimate is generally consistent with the estimate under the constant coalescent tree prior assumption, suggesting our date estimation is robust for the choice of tree priors.
Implications for the origin of the Hu-SCoV lineage.
Previous studies concluded that none of the currently sampled Bt-SLCoVs is the direct ancestor of the Hu-SCoV lineage based on their relatively distant phylogenetic relationships (43) and molecular-dating results of the putative interspecies transmission event (49). These reports suggest that there may be an unknown intermediate host that acquired a Bt-SLCoV from bats and transmitted it to an amplifying host, such as civets (49), or that Bt-SLCoVs that are phylogenetically closer to the Hu-SCoVs were not sampled (43). However, due to the conflicting phylogenetic relationships between different genomic regions of Rp3 and Hu-SCoV revealed in this work, previous interpretations regarding the closest related Bt-SLCoV strain must be reconsidered. This study demonstrates the recombinant origin of Rp3, emphasizing the presence of an uncharacterized lineage (i.e., the HB-SLCoV lineage) that is phylogenetically closer to Hu-SCoVs than any of the currently sampled Bt-SLCoVs. In addition, our molecular-dating analyses suggest the HB-SLCoV and Hu-SCoV lineage diverged a median of 4.08 years prior to the outbreak. Based on this relatively short window period and their close phylogenetic relationship, we speculate that strains arising from this previously uncharacterized lineage may include the direct ancestor of the SCoVs in live-market animals that contributed to the emergence of SARS in 2003. It is noted that a previous report suggested that the most closely related Bt-SLCoV (i.e., Rp3) and Hu-SCoV diverged a mean of 17 years prior to the outbreak (49). Our credible interval excludes a divergence time this long ago (Table (Table3).3). However, due to the relatively large credible interval of the earlier estimate, our estimate falls within its HPD but with improved precision. The choice of genome region for molecular dating, i.e., the HEL gene in the earlier work versus ORF1 in this study, may contribute to the observed differences.
Based on the S protein sequences of the currently sampled Bt-SLCoV, Li and coworkers (32) pointed out that substantial genetic changes in the S protein are likely to be necessary for the virus to infect humans. Due to the fact that the S protein sequence of the direct ancestor of Hu-SCoV is currently unavailable, the genetic factors (e.g., residues under positive selection) that contributed to the switch of species tropism from the bat to the amplifying hosts cannot be determined. We expect that further characterization of the S sequences of the strains of the HB-SLCoV lineage should provide important information regarding the changes that may contribute to cross-species adaptation of the virus.
The observed genetic diversity among currently sampled Bt-SLCoVs strongly suggests bats, in particular, the genus Rhinolophus, are the natural reservoir of SLCoVs and SCoVs. However, among the 69 species of the genus Rhinolophus, the specific species that harbors the direct ancestor of Hu-SCoVs is still unknown (51). One possibility is that there were two phylogenetically distinct lineages of Bt-SLCoV residing in the bat species R. pearsoni that underwent recombination, giving rise to the recombinant strain Rp3. Thus, we suggest a more focused surveillance of SLCoVs in R. pearsoni, which may provide insights into the prevalence and diversity of this recombinant genotype, as well as the possible direct ancestor of Hu-SCoVs.
Another interesting outcome of our analysis is the very young age of the common ancestor of SLCoVs in bats (i.e., the root of the phylogeny in Fig. Fig.5;5; median, 1982.81; HDP, 1965.75 to 1995.83). It is noted that this estimate refers only to the tMRCA of all currently sampled Bt-SLCoVs, and characterization of more diverged Bt-SLCoVs should extend the age of the lineage. Nonetheless, this estimate precludes codivergence of Bt-SLCoVs with their host bat species. More importantly, it suggests that cross-species transmission of these viruses between different bat species is very common and occurs on an ongoing basis. Interspecies transmissions of CoVs among wildlife and livestock species are well documented (46). With SARS as an example, more comprehensive surveillances of pathogens in wildlife species should make an important contribution to the detection and control of emerging zoonotic infections (24).
Acknowledgments
This work was supported by the Research Fund for the Control of Infectious Diseases (reference number 06060672) from the Hong Kong SAR government.
We thank Susanna K. P. Lau of the Department of Microbiology, Faculty of Medicine, University of Hong Kong, for her valuable comments on the manuscript.
REFERENCES
Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)
Full text links
Read article at publisher's site: https://doi.org/10.1128/jvi.01926-07
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc2258724?pdf=render
Citations & impact
Impact metrics
Article citations
Cellular dynamics shape recombination frequency in coronaviruses.
PLoS Pathog, 20(9):e1012596, 27 Sep 2024
Cited by: 0 articles | PMID: 39331680 | PMCID: PMC11463787
Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability.
PLoS One, 19(8):e0309391, 26 Aug 2024
Cited by: 0 articles | PMID: 39186542 | PMCID: PMC11346643
The Applications of Nanopore Sequencing Technology in Animal and Human Virus Research.
Viruses, 16(5):798, 16 May 2024
Cited by: 0 articles | PMID: 38793679 | PMCID: PMC11125791
Review Free full text in Europe PMC
Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates.
PLoS Pathog, 20(4):e1012163, 22 Apr 2024
Cited by: 3 articles | PMID: 38648214 | PMCID: PMC11065284
Reverse Genetic Assessment of the Roles Played by the Spike Protein and ORF3 in Porcine Epidemic Diarrhea Virus Pathogenicity.
J Virol, 97(7):e0196422, 26 Jun 2023
Cited by: 5 articles | PMID: 37358450 | PMCID: PMC10373562
Go to all (142) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Nucleotide Sequences
- (1 citation) ENA - AY274119
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans.
J Gen Virol, 91(pt 4):1058-1062, 16 Dec 2009
Cited by: 71 articles | PMID: 20016037
Severe Acute Respiratory Syndrome (SARS) Coronavirus ORF8 Protein Is Acquired from SARS-Related Coronavirus from Greater Horseshoe Bats through Recombination.
J Virol, 89(20):10532-10547, 12 Aug 2015
Cited by: 130 articles | PMID: 26269185 | PMCID: PMC4580176
Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats.
Proc Natl Acad Sci U S A, 102(39):14040-14045, 16 Sep 2005
Cited by: 940 articles | PMID: 16169905 | PMCID: PMC1236580
Molecular epidemiology, evolution and phylogeny of SARS coronavirus.
Infect Genet Evol, 71:21-30, 04 Mar 2019
Cited by: 135 articles | PMID: 30844511 | PMCID: PMC7106202
Review Free full text in Europe PMC