Abstract
Free full text
Integrated genomic and fossil evidence illuminates life’s early evolution and eukaryote origins
Abstract
Establishing a unified timescale for the early evolution of Earth and Life is challenging and mired in controversy because of the paucity of fossil evidence, the difficulty of interpreting it, and dispute over the deepest branching relationships in the tree of life. Surprisingly, it remains perhaps the only episode in the history of Life where literal interpretations of the fossil record hold sway, revised with every new discovery and reinterpretation. We derive a timescale of life, combining a reappraisal of the fossil material with new molecular clock analyses. We find that the last universal common ancestor of cellular life (LUCA) predated the end of late heavy bombardment (>3.9 Ga). The crown clades of the two primary divisions of life, Eubacteria and Archaebacteria, emerged much later (<3.4 Ga), relegating the oldest fossil evidence for life to their stem lineages. The Great Oxidation Event significantly predates the origin of modern Cyanobacteria, indicating that photosynthesis evolved within the cyanobacterial stem-lineage. Modern eukaryotes emerged late in Earth history (<1.84 Ga), falsifying the hypothesis that the GOE facilitated their radiation. The symbiotic origin of mitochondria, at 2.053 – 1.21 Ga reflects a late origin of the eukaryotes, that do not constitute a primary linage of life.
Examining the emergence of Life and its subsequent evolution has traditionally been carried out via interpretation of the fossil record. However, this record, especially when looking at the earliest scions of life, is minimal and made harder to interpret due to difficulties substantiating relationships within the earliest branching lineages of the tree of life1,2. Despite being problematic the fossil record has been the main source of information informing the timeline of life’s evolution. We attempt to shed light on this early period by presenting a molecular timescale utilising the ever-growing collection of genetic data and explicitly incorporating uncertainty associated with fossil sampling, ages, and interpretations1,3–5.
Calibrations are a crucial component of divergence time estimation. Relative divergence times can be inferred using alternative lines of evidence, e.g. horizontal gene transfers6. However, an absolute timescale for evolutionary history can only be derived when calibrations are included in the analyses7,8. We derived a suite of calibrations, following best practice4 for the fundamental clades within the Tree of Life, drawing on multiple lines of evidence, including physical fossils, biomarkers and isotope geochemistry2. Two key calibrations, for the last universal common ancestor (LUCA) and the oldest total-group eukaryotes, constrain the whole tree by setting a maximum on the root, while also informing the timing of divergence of eukaryotes within Archaea9,10. Putative records for life extend back to the Eoarchaean, including microfossils11,12, stromatolites13 and isotope data14,15 from the ~3.8 Ga Isua Greenstone Belt (Greenland). However, these records have been contested16–18. Microfossils from the ~3.4 Ga Strelley Pool Formation, Australia, are the oldest conclusive evidence to constrain the age of LUCA19. The fossils, many of which are arranged in chains of cells, have been demonstrated through nanoscale imaging and Raman spectroscopy, to exhibit a complex morphology with a central, usually hollow, lenticular body and a wall that is either smooth or in some cases reticulated; these features are beyond the scope of pseudofossils2. The Strelley Pool Formation also contains other microfossils20–22, in association with both distinct ∂13Corg and ∂13Cinorg23 and pyrite indicative of sulphur metabolisms24, along with stromatolites that exhibit biological structure25. Overall these data allow us to confidently use the Strelley Pool Biota as the oldest, undisputable, record of life. For a maximum constraint on the age of LUCA we considered the youngest event on Earth that life could not have survived. Conventionally, this is taken as the end of the episode of late heavy bombardment, but modelling has shown that this would not have been violent enough for planet sterilization26. However, the last formative stage of Earth’s formation, the Moon-forming impact, melted and sterilised the planet. The oldest fossil remains that can be ascribed to crown-Eukaryota is ~1.1 Ga Bangiomorpha pubescens 27,28, which can be confidently assigned to the red algal total group (Rhodophyta). Older fossil remains from the >1.561 Ga Chitrakoot Formation have been tentatively interpreted as red algae29, however, current knowledge of their morphology does not allow for an unequivocal assignment to crown-Archaeplastida. The oldest fossil remains that can be ascribed with certainty to Eukaryota are acritarchs from the >1.6191 Ga Changcheng Formation, North China30, discriminated from prokaryotes by their large size (40-250μm) and complex wall structure including striations, longitudinal ruptures, and a trilaminar organization. However, these structures do not indicate membership to any specific crown-eukaryote clade, only allowing us to use these records to constrain minimally the timing of divergence between the Eukaryota and their archaebacterial sister lineage, Asgardarchaeota9,10,31. As there is no other evidence to maximally constrain the time of divergence between Eukaryota and Asgardarchaeota, we used the same maximum placed on LUCA, i.e. the Moon-forming impact. These key time constraints were combined with nine others (See SI) to calibrate a timescale of life estimated from a dataset of 29 highly conserved, mainly ribosomal, universally distributed proteins (see SI) using a relaxed molecular clock modelled in a Bayesian framework.
Results
Analytical choices can deeply affect molecular clock posterior age estimates32 and we explored a range of prior probability distributions to model our fossil calibrations and estimate conservative credibility intervals for our divergence times. Initially, we applied a hard maximum of 4.52 Ga (the age of the Moon forming impact) to the root of our tree and used uniform age priors (reflecting agnosticism about divergence timing relative to constraints) to the other fossil calibrations (Fig. 1a). These analyses assumed an uncorrelated molecular clock model and produced the amino acid substitution processes using optimal gene-specific substitution models. Subsequently, we explored the impact of using calibration protocols based on non-uniform age priors. Firstly, we implemented a truncated Cauchy distribution with the mode located halfway between the minimum and maximum bounds, reflecting a prior view that true divergence times should fall between the calibration points (Fig. 1b). We then skewed the Cauchy distribution such that the mode shifted towards the minimum or the maximum constraint, reflecting prior views that the fossils used to calibrate the tree are either very good (Fig. 1c) or very poor (Fig. 1d) proxies of the true divergence times. Our results proved robust to the use of different calibration strategies, only identifying some variability in the size of the recovered credibility intervals (Fig. 2a-c).
We explored the impact of different strategies for modelling both the molecular clock (Fig. 1e) and the amino acid substitution process (Fig. 1f). Only minimal differences in posterior ages were found between analyses using an uncorrelated or an autocorrelated clock (Fig. 2d). Consistently, Bayesian cross-validation indicated that the two models do not differ significantly in their fit to the data (Cross validation score = 0.7 +/- 2.96816 in favour of the uncorrelated clock). In contrast, using a single substitution model across the 29 genes, or using an optimal set of gene-specific substitution models inferred using PartitionFinder33 resulted in very different age estimates (Fig. 1f, ,2e).2e). Using a single substitution model recovered larger credibility intervals (Fig. 2e) with a more homogeneous distribution of branch lengths across the tree, and older divergence times (compare Fig. 1f and Fig. 1a-d). An Akaike Information Criterion (AIC) test indicated that the partitioned model provides a significantly better fit to the data (AIC-score = 565.21 in favour of 29 gene-specific models), allowing the rejection of the divergence times obtained with a single substitution model. As expected, divergence times estimated from individual genes were much less precise, although posterior age estimates overlap well (SI S4.1). This indicates that the genes comprising our dataset encode a congruent signal and the timescale inferred from the combined analysis is not biased by single gene outliers. Furthermore, their combination improves the precision of the clade age estimates (Fig. 2f-j), which are clearly informed by the data (SI S4.2). We tested the effect of taxonomic sampling by doubling the number of cyanobacteria and alphaproteobacteria in our dataset. We then explored the effect of phylogenetic uncertainty by dating a tree compatible with Woese’s three domains hypothesis34 and by dating all the 15 trees in the 95% credible set of trees from our phylogenetic analysis (S4.3, S4.4). Further analyses that used co-estimation of tree and topology (S4.5)35 did not reach convergence (S4.6), but recovered results that are congruent with those obtained from well converged analyses (reported in S.4.4) where topology and time were sequentially inferred – see S4.5 for a discussion. Overall, it appears that our results are robust to topological uncertainty and the use of differential taxonomic sampling (SI S4.3 – S4.5).
It is not possible to discriminate between the competing calibration strategies which reflect different interpretations of the fossil record, similarly, our model selection test indicated that the autocorrelated and the independent-rates clock models fits the data equally well. Thus, in establishing an accurate timescale of life we integrated over the uncertainties associated with results from all these analyses (Fig. 3). The joint 95% credibility intervals reject a post late heavy bombardment (~3900 Ma)36 emergence of LUCA (4519-4477 Ma). The crown clades of the primary divisions of life, Archaebacteria and Eubacteria, emerged over a billion years after LUCA in the Mesoarchaean-Neoarchaean. The earliest conclusive evidence of cellular life (Strelley Pool Formation, Australia2) falls within the 95% credibility intervals for the ages of the last common ancestors of both clades indicating that these fossils might belong to one of the two living prokaryotic lineages.
Discussion
Methanogenesis is classically associated with Euryarchaeota. Our estimate for the age of crown-Euryarchaeota (2881-2425 Ma) is consistent with carbon isotope excursions indicating the presence of methanogens by 2 Ga37 but is substantially younger than the earliest possible evidence of biogenic methane in the geochemical record at ~3.5 Ga38,39. If the geochemical evidence is correct, our timescale implies that methanogenesis pre-dated the origin of Euryarchaeota. This hypothesis would be consistent with recent environmental genomic surveys indicating that other archaeal lineages may also be capable of methane metabolism40 or methanogenesis41, and that metabolisms using the Wood-Ljungdahl pathway to fix carbon minimally evolved in stem-archaebacteria42(WAS6),43 and might have been a characteristic of LUCA43–45.
The GOE (~2.4 Ga) was perhaps the most significant episode in the Proterozoic46, fundamentally changing the chemistry of Earth’s atmosphere, oceans, and likely altering temperature. It has been causally associated with the evolution of cyanobacteria, as a consequence of their oxygen release47,28, and implicated as an extrinsic driver of eukaryotic evolution48. Our timescale indicates that crown-Cyanobacteria and crown-Eukaryota significantly postdate the GOE. Crown-Cyanobacteria diverged 1947–1023 Ma, precluding the possibility that oxygenic photosynthesis emerged in the cyanobacterial crown ancestor. However, the cyanobacteria separated from other eubacterial lineages (Fig. 3), including the non-photosynthetic sister group of the Cyanobacteira (Melanibacteria; SI 4.3) in the Archaean, prior to the GOE, consistent with the view that oxygenic photosynthesis evolved along the cyanobacterial stem49 and compatible with a causal role of the Total group Cyanobacteria in the GOE.
Crown-Eukaryota diverged considerably after both the Eukaryota-Asgardarchaeota split and the GOE, in the middle Proterozoic (1842-1210 Ma). Our study strongly rejects the idea that eukaryotes might be as old, or older than prokaryotes50, and agrees with a number of other studies that dates the Last Eukaryote Common Ancestor (LECA) to the Proterozoic (~1866-1679 Ma)51–53. Within eukaryotes, the main extant clades emerged by the middle Proterozoic, including Opisthokonta (~1707-1125 Ma), Archaeplastida (~1667-1118 Ma) and SAR (~1645-1115 Ma). The symbiotic origin of the plastid occurred among stem archaeplastids (~1774-1118 Ma), and our 95% credibility interval for the origin of the plastid overlap with the results of other recent studies28,50,53. The relatively long stem-lineage subtending LECA is intriguing. It is found using both uncorrelated and autocorrelated clock models (Fig. 1e, ,2d),2d), and disappears only if a poorly-fitting, single substitution model is used (Fig. 1f, ,2e),2e), suggesting that it is not a modelling artefact. Analyses excluding the hitherto unknown immediate living relatives of Eukaryota9,31, Asgardarchaeota, had no significant impact on the span of the eukaryote stem-lineage, suggesting that its length is robust to taxon sampling (SI 4.7).
Our timescale for eukaryogenesis rejects the hypothesis of an inextricable link between the GOE and the origin of eukaryotes48. Competing hypothesis for eukaryogenesis hinge on the early versus late acquisition of mitochondria relative to other key eukaryote characters55,56–59. Absolute divergence times cannot discriminate between theses hypotheses. However, as the only proposed evidence in support of the mitochondria late57 hypothesis have been shown to be artifactual58, the similar age estimates for Alphaproteobacteria and LECA, at this stage are most conservatively interpreted as indicating that the process of mitochondrial symbiosis drove a rapid process of eukaryogenesis. This process involved large transfer of genes from the genome of the alphaproteobacterial symbiont to that of the archaeal host59,60, as predicated on metabolism55,61.
The search for the earliest fossil evidence of life on Earth has created more heat than light. Though the fossil record remains integral to establishing a timescale for the Tree of Life, it is not sufficient in and of itself. Our integrative molecular timescale encompasses the uncertainty associated with fossil, geological and molecular evidence, as well its modelling, allowing it to serve as a solid foundation for testing evolutionary hypotheses in deep time for clades that do not have a conclusive fossil record.
Materials and Methods
Dataset collation and Phylogenetic analysis
The dataset consists of a 102 species and 29, universally distributed, protein-coding genes (see SI). All our data and scripts are available at https://bitbucket.org/bzxdp/betts_et_al_2017. Proteomes were downloaded from GenBank62 and putative orthologs were identified using BLAST63. The top hits were compiled and aligned into gene specific files in MUSCLE64 and trimmed to remove poorly aligned sites using Trimal65. To minimise the possible inclusion of paralogs and laterally transferred genes, we generated gene trees (under CAT-GTR+G) in PhyloBayes66 and excluded sequences when the tree topology suggested that they might have been paralogs. The sequences were then concatenated into a supermatrix using FASconCAT67, and phylogenetic analyses were performed using PhyloBayes66. The superalignment was initially analysed under both GTR+G and CAT-GTR+G68. RogueNaRok69 was used to identify rogue taxa, and analyses were repeated (under both GTR+G and CAT-GTR+G) after having excluded unstable taxa. One final analysis was performed that included only the eukaryotic sequences in our dataset (under CAT-GTR+G). For all Phylobayes analyses convergence was tested in Phylobayes using BPCOMP and TRACECOMP.
Calibrations
In total we used 11 calibrations spread throughout the tree but mainly found within the Eukaryotes as this group has the best fossil record. Calibration choice was carried out conservatively using coherent criteria70. Full details of each calibration used can be found in the supplementary information.
MCMCTree analysis
For our clock analyses, we used a constraint tree based on our CAT-GTR+G and GTR+G trees (Fig. S3.2, S3.3 and S4) – see results of phylogenetic analyses in SI for details. The complete phylogeny was rooted to separate the Bacteria from the other lineages (i.e. Archaea and Eukayota). To select the amino acid model to be used in our molecular clock analyses we used PartitionFinder v1.1.171. Divergence time estimation was carried out using the approximate likelihood calculation in MCMCTree v4.972. We set 4 different calibration density distributions, uniform, skewed towards the minimum, skewed towards the maximum and midway between these two dates. For this we used the Uniform and Cauchy models within MCMCTree, which can be set to place the maximum probability of the node falling in a certain space between the calibrations, the values for these were first produced using MCMCTreeR73 code in R74. We investigated two strategies to model amino acid sequence evolution: a single WAG+G model or the optimal partitioned model suggested by PartitionFinder. The latter used 29 gene-specific models (28 LG+G and one WAG+G). The Akaike Information Criterion was used to test whether using a single model or a partitioned model provided a better fit to the data. Rate variation across lineages was modelled using both an autocorrelated and an uncorrelated clock model. Bayesian cross-validation was used to test whether one of the two considered, relaxed molecular clock models best fitted the data (implemented in Phylobayes).
In all our molecular clock analyses a soft tail of 2.5% was applied to the upper calibration bound and a hard minimum. This apart from the root node to which a hard maximum was applied, and the nodes calibrated using Bangiomorpha75 to which a soft minimum tail of 2.5% was applied. For all Molecular clock analyses convergence was tested in Tracer76 by comparing plots of estimates from the two independent chains and evaluating that, for each model parameter and divergence time estimate, the effective sample size was sufficiently large. All reported molecular clock analyses reached excellent levels of convergence.
Supplementary Material
Reporting Summary
Supplementary information
Aknowledgments
H.C.B was supported by a NERC GW4 PhD studentship; J.W.C was supported by a BBSRC SWBio PhD studentship. M.N.P. was supported by a 1851 Royal Commission Fellowship, P.C.D. by a BBSRC grant BB/N000919/1. T.A.W. is supported by a Royal Society Fellowship and a NERC grant NE/P00251X/1.
Footnotes
Data Availability. All our dataset are available in Bitbucket https://bitbucket.org/bzxdp/betts_et_al_2017.
Author contributions
D.P., P.C.D. and T.A.W. desiged the study. H.C.B. assembled the datasets and performed the phylogenetic and molecular clock analyses. M.N.P. and J.W.C. contrubuted further molecular clock analyses. H.C.B., D.P., P.C.D. and T.A.W. wrote the manuscripts. All authors edited the manuscript and approved the final version.Competing interests
The authors declare no competing interests.
References
Full text links
Read article at publisher's site: https://doi.org/10.1038/s41559-018-0644-x
Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/s41559-018-0644-x.pdf
Citations & impact
Impact metrics
Article citations
Ediacaran origin and Ediacaran-Cambrian diversification of Metazoa.
Sci Adv, 10(46):eadp7161, 13 Nov 2024
Cited by: 0 articles | PMID: 39536100 | PMCID: PMC11559618
Challenges in Assembling the Dated Tree of Life.
Genome Biol Evol, 16(10):evae229, 01 Oct 2024
Cited by: 0 articles | PMID: 39475308 | PMCID: PMC11523137
Review Free full text in Europe PMC
Widespread position-dependent transcriptional regulatory sequences in plants.
Nat Genet, 56(10):2238-2246, 12 Sep 2024
Cited by: 1 article | PMID: 39266765 | PMCID: PMC11525189
The emerging view on the origin and early evolution of eukaryotic cells.
Nature, 633(8029):295-305, 11 Sep 2024
Cited by: 0 articles | PMID: 39261613
Review
GTP before ATP: The energy currency at the origin of genes.
Biochim Biophys Acta Bioenerg, 1866(1):149514, 24 Sep 2024
Cited by: 0 articles | PMID: 39326542 | PMCID: PMC7616719
Go to all (129) article citations
Other citations
Wikipedia (Showing 7 of 7)
- https://en.wikipedia.org/wiki/Biozone
- https://en.wikipedia.org/wiki/Last_universal_common_ancestor
- https://en.wikipedia.org/wiki/Micropaleontology
- https://en.wikipedia.org/wiki/Earliest_known_life_forms
- https://en.wikipedia.org/wiki/2018_in_science
- https://en.wikipedia.org/wiki/2018_in_paleontology
- https://en.wikipedia.org/wiki/Gloeomargarita_lithophora
Show less
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification.
Int J Syst Evol Microbiol, 52(pt 1):7-76, 01 Jan 2002
Cited by: 182 articles | PMID: 11837318
Review
Frameworks for Interpreting the Early Fossil Record of Eukaryotes.
Annu Rev Microbiol, 77:173-191, 01 Sep 2023
Cited by: 3 articles | PMID: 37713454
Review
A genomic timescale for the origin of eukaryotes.
BMC Evol Biol, 1:4, 12 Sep 2001
Cited by: 75 articles | PMID: 11580860 | PMCID: PMC56995
Dating Alphaproteobacteria evolution with eukaryotic fossils.
Nat Commun, 12(1):3324, 03 Jun 2021
Cited by: 23 articles | PMID: 34083540 | PMCID: PMC8175736
Funding
Funders who supported this work.
Biotechnology and Biological Sciences Research Council (2)
Grant ID: 1563670
Improving Bayesian methods for estimating divergence times integrating genomic and trait data
Dr Philip Donoghue, University of Bristol
Grant ID: BB/N000919/1
Natural Environment Research Council (4)
Grant ID: NE/P013678/1
Grant ID: 1671097
Grant ID: NE/N003438/1
Grant ID: NE/P00251X/1