Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


We present a genome assembly of a diploid specimen of Alnus glutinosa (the common alder; Streptophyta; Magnoliopsida; Fagales; Betulaceae). The genome sequence has a total length of 456.80 megabases. Most of the assembly is scaffolded into 14 chromosomal pseudomolecules. The mitochondrial genome assemblies have lengths of 505.23 and 155.85 kilobases and the plastid genome is 160.82 kilobases long. Gene annotation of this assembly on Ensembl identified 23,728 protein-coding genes.

Free full text 


Logo of wopenresLink to Publisher's site
Version 1. Wellcome Open Res. 2024; 9: 570.
PMCID: PMC11541074
PMID: 39512383

The genome sequence of the common alder, Alnus glutinosa (L.) Gaertn. (Betulaceae)

Maarten J. M. Christenhusz, Investigation, Resources, Writing – Original Draft Preparation, Writing – Review & Editing,1 Zoë Goodwin, Investigation, Resources,2 David G. Bell, Investigation, Resources,2 Claudia A. Martin, Writing – Original Draft Preparation, Writing – Review & Editing,2,3 Royal Botanic Gardens Kew Genome Acquisition Lab, Royal Botanic Garden Edinburgh Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Plant Genome Sizing collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, and Darwin Tree of Life Consortiumcorresponding authora

Associated Data

Data Citations
Data Availability Statement

Abstract

We present a genome assembly of a diploid specimen of Alnus glutinosa (the common alder; Streptophyta; Magnoliopsida; Fagales; Betulaceae). The genome sequence has a total length of 456.80 megabases. Most of the assembly is scaffolded into 14 chromosomal pseudomolecules. The mitochondrial genome assemblies have lengths of 505.23 and 155.85 kilobases and the plastid genome is 160.82 kilobases long. Gene annotation of this assembly on Ensembl identified 23,728 protein-coding genes.

Keywords: Alnus glutinosa, common alder, genome sequence, chromosomal, Fagales

Species taxonomy

Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliopsida; Mesangiospermae; eudicotyledons; Gunneridae; Pentapetalae; rosids; fabids; Fagales; Betulaceae; Alnus; Alnus glutinosa (L.) Gaertn. (NCBI:txid3517).

Background

The common alder, Alnus glutinosa, also known as black alder or European alder, is a short-lived, deciduous tree to 30 m tall, belonging to the family Betulaceae. Predominantly found in moist environments, this species plays a pivotal role in riparian ecosystems, wet woodlands, along riverbanks, lakeshores and in marshy areas ( Stroh et al., 2023). In some places the species can be dominant where they are the main element of alder carr.

Alnus glutinosa is deciduous, and leaves emerge late in spring and drop in autumn. Flowering precedes leaf emergence in spring, with male flowers arranged in catkins that release pollen in the wind, and female catkins united in dense ellipsoid clusters that resemble small cones. Once pollinated by wind, the female catkins become woody, cone-like fruit structures that mature in autumn and persist through winter, gradually releasing seeds that are initially dispersed by wind, but often dispersal is aided by watercourses. The seeds are an important source of food for birds in the winter. The epithet glutinosa is Latin for sticky, and refers particularly to young shoots and leaves that are sticky to the touch.

Alnus glutinosa is noted for its nitrogen-fixing ability, accomplished through a symbiotic relationship with the actinomycete bacterium Frankia alni (Woronin, 1866) Von Tubeuf 1895 ( Paschke et al., 1989; Pawlowski & Newton, 2010) forming nodules on the roots. This relationship enables A. glutinosa to thrive in nutrient-poor soils, enhancing soil fertility and promoting plant biodiversity in its habitat, making it a keystone species in wetland ecosystems. It plays a significant role in these ecosystems by stabilising soil and reducing erosion, as well as regulating water levels. The ability of the tree to improve soil fertility through nitrogen fixation makes it beneficial in agroforestry systems and land reclamation projects. A. glutinosa can also grow in drier locations and sometimes occurs in mixed woodland.

Alnus glutinosa is native to Europe, southwest Asia and northern Africa. Its range extends from Scandinavia and Finland to the Mediterranean and from Ireland to the Caucasus and Volga Valley. It has naturalised in the Azores, Shetland, northeastern North America, southern South America, Tasmania, New Zealand and South Africa ( POWO, 2024). It is widely distributed in the UK and Ireland and thrives in various wetland habitats, with significant populations in bogs, marshes and in the floodplains of all major rivers and along lakeshores ( Preston et al., 2002; Stace et al., 2019; Stroh et al., 2023). Its adaptability to waterlogged soil conditions has facilitated its spread across diverse regions in its native range where many other tree species are unable to cope. In combination with its ability to tolerate periodic droughts, this makes A. glutinosa resilient to climate change. The genus Alnus has 41 accepted species, some of which have also established populations in Ireland and the UK (e.g. A. cordata (Loisel.) Duby, A. incana (L.) Moench, A. rubra Bong.) . Alnus glutinosa is the only native species of the genus in Britain and Ireland ( POWO, 2024).

Some alders in Britain have been infected by a fungal pathogen, Phytophthora alni Brasier & S.A.Kirk, sometimes known as alder dieback. This pathogen was first recorded in the UK in 1993, and has become widespread. Evolution of hybrid strains of this fungal pathogen may increase the susceptibility of European alder species to the disease, hence its impact is predicted to increase over time ( Brasier et al., 2004; Fuller et al., 2023). This infection causes the death of roots and patches of bark. Dark spots can also form near the base of the trunk, and the leaves often turn yellow in summer.

Numerous other fungi grow on alder, most of which are mutualistic or benign. Several species only grow in association with Alnus glutinosa. Taphrina alni (Berk. & Broome) Gjaerum, a fungal pathogen causes alder tongue galls. Another harmless gall is caused by Eriophyes laevis (Nalepa, 1889), a midge that sucks sap from leaves, resulting in small pustules on the leaf tissue. While a spider mite, Aceria nalepai (Fockeu, 1890), also causes small galls that form mostly on the midveins. Two lichen species are found only (or mostly) on alder: Stenocybe pullatula (Ach.) Stein and Menegazzia terebrata (Hoffm.) A.Massal. Dozens of insect species are known to feed on alder leaves, including several that are specific to alder, such as the striped alder sawfly ( Hemichroa crocea (Geoffroy, 1785)), the May Highflyer ( Hydriomena impluviata (Denis & Schiffermüller, 1775)), the Dingy Shell ( Euchoeca nebulata (Scopoli, 1763)) and the Alder Kitten moth ( Furcula bicuspis (Borkhausen, 1790)). The trees also provide shelter for aquatic animals and for breeding birds. The rare fern Dryopteris cristata (L.) A.Gray is associated with alder carr.

Alder has been valued for a variety of ecological and construction uses. Its wood is resistant to water and has been employed in the construction of underwater structures, including bridges, docks and mills. It was often coppiced with the wood used for water pipes or for high quality charcoal made into gunpowder or gas mask filters. Shepherds also carved the wood into flutes and sometimes clogs ( De Cleene & Lejeune, 2000).

Alnus glutinosa also has a rich cultural history in European folklore. In Norse mythology, the first woman was carved from alder wood. It was also considered a sacred tree by the Celts, who believed it possessed protective qualities. In the traditional Irish folksong Song of the Forest Trees, alder is described as “battle-witch of all woods, tree that is hottest in the fight”. Cutting alder trees in Ireland was once forbidden, but the wood had sometimes been fashioned into warrior shields by Celtic tribes. Because alder wood has a vibrant orange-red inner bark and the white wood turns red after logging, it became associated with blood. This and the fact that alder forms dark thickets in mysterious places like swamps led to its use in various rituals in local cults. It often has negative connotations, ranging from the folklore tale that the red colour resulted from the devil beating his grandmother with alder twigs, to the idea that witches used it to influence the weather. Alder also had positive folklore associated with it, such as fending off devils and witches, and blessing seeds before sowing, so that the birds would leave them alone ( De Cleene & Lejeune, 2000).

Alder leaves and bark have been utilised in traditional medicine for their anti-inflammatory and anti-aging properties. Poultices were commonly made from fresh alder leaves to heal infections, following the medical wisdom from ancient times, but maintained until the 20th century. Bark from young twigs can be brewed into a drink that helps lower fever. This liquid is still used to treat mouth sores, angina and tonsilitis ( De Cleene & Lejeune, 2000).

The common alder has a diploid chromosome number of 2 n = 28. Here, we present the first chromosome-level Alnus glutinosa genome, which will provide valuable insights into the genetic basis of its ecological adaptations (including host-pathogen evolution), symbiotic relationships and evolutionary history. This genomic resource will be instrumental for conservation efforts, breeding programmes and understanding the role of alder in ecosystem services. Additionally, it holds potential economic and social benefits, such as improving land reclamation strategies and enhancing the ecological health of riparian and wetland habitats.

Genome sequence report

The genome of an Alnus glutinosa specimen ( Figure 1) was sequenced, based on a total of 38-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 95-fold coverage in 10X Genomics read clouds. Using flow cytometry, the genome size (1C-value) of the Alnus glutinosa specimen was estimated to be 0.67 pg, equivalent to 660 Mb.

An external file that holds a picture, illustration, etc.
Object name is wellcomeopenres-9-25477-g0000.jpg
Photographs of the Alnus glutinosa (dhAlnGlut1) specimen used for genome sequencing.

a) Habitat along River Thames in Canbury Gardens. b) leaves. c) female fruits (white arrows) and young male catkins (blue arrows).

Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, which produced 79.17 Gbp from 524.31 million reads, yielding an approximate coverage of 173-fold. Specimen and sequencing information is summarised in Table 1.

Table 1.

Specimen and sequencing data for Alnus glutinosa.
Project information
Study title Alnus glutinosa
Umbrella BioProject PRJEB46320
Species Alnus glutinosa
BioSample SAMEA7522050
NCBI taxonomy ID 3517
Specimen information
Technology ToLID BioSample accession Organism part
PacBio long read sequencing dhAlnGlut1SAMEA7522127leaf
Hi-C sequencing dhAlnGlut1SAMEA7522125leaf
RNA sequencing dhAlnGlut2SAMEA7536056leaf
Sequencing information
Platform Run accession Read count Base count (Gb)
Hi-C Illumina NovaSeq 6000 ERR66885335.24e+0879.17
PacBio Sequel II ERR68080047.72e+0510.32
PacBio Sequel IIe ERR69392426.14e+058.76
Chromium Illumina NovaSeq 6000 ERR66885291.06e+0816.07
Chromium Illumina NovaSeq 6000 ERR66885301.21e+0818.32
Chromium Illumina NovaSeq 6000 ERR66885311.08e+0816.28
Chromium Illumina NovaSeq 6000 ERR66885321.11e+0816.77
RNA Illumina HiSeq 4000 ERR94350074.47e+076.75

Manual assembly curation corrected 87 missing joins or mis-joins, reducing the assembly length by 6.9%. The final assembly has a total length of 456.80 Mb in 15 sequence scaffolds with a scaffold N50 of 30.5 Mb ( Table 2) with 96 gaps. The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds based on GC proportion and coverage is shown in Figure 3. The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla. Most (99.81%) of the assembly sequence was assigned to 14 chromosomal-level scaffolds. Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size ( Figure 5; Table 3). On chromosome 12, a region of tandem repeats was observed at 14.75–18.37 Mb. Order and orientation of scaffolds in this region is unknown. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. The mitochondrial and plastid genomes were also assembled and can be found as contigs within the multifasta file of the genome submission.

An external file that holds a picture, illustration, etc.
Object name is wellcomeopenres-9-25477-g0001.jpg
Genome assembly of Alnus glutinosa, dhAlnGlut1.1: metrics.

The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 457,583,206 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (53,352,176 bp, shown in red). . Orange and pale-orange arcs show the N50 and N90 scaffold lengths (30,513,540 and 25,901,623 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/dhAlnGlut1_1/dataset/dhAlnGlut1_1/snail.

An external file that holds a picture, illustration, etc.
Object name is wellcomeopenres-9-25477-g0002.jpg
Genome assembly of Alnus glutinosa, dhAlnGlut1.1: Blob plot of base coverage against GC proportion for sequences in the assembly.

Sequences are coloured by phylum. Circles are sized in proportion to sequence length. Histograms show the distribution of sequence length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/dhAlnGlut1_1/dataset/dhAlnGlut1_1/blob.

An external file that holds a picture, illustration, etc.
Object name is wellcomeopenres-9-25477-g0003.jpg
Genome assembly of Alnus glutinosa dhAlnGlut1.1: BlobToolKit cumulative sequence plot.

The grey line shows cumulative length for all sequences. Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/dhAlnGlut1_1/dataset/dhAlnGlut1_1/cumulative.

An external file that holds a picture, illustration, etc.
Object name is wellcomeopenres-9-25477-g0004.jpg
Genome assembly of Alnus glutinosa, dhAlnGlut1.1: Hi-C contact map of the dhAlnGlut1.1 assembly, visualised using HiGlass.

Chromosomes are shown in order of size from left to right and top to bottom. An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=ewauaBn_S4-EMrvmkdS1-g.

Table 2.

Genome assembly data for Alnus glutinosa, dhAlnGlut1.1.
Genome assembly
Assembly namedhAlnGlut1.1
Assembly accessionGCA_958979055.1
Accession of alternate haplotype GCA_958979045.1
Span (Mb)456.80
Number of contigs114
Contig N50 length (Mb)24.4
Number of scaffolds15
Scaffold N50 length (Mb)30.5
Longest scaffold (Mb)53.35
Assembly metrics * Benchmark
Consensus quality (QV)52.4 ≥ 50
k-mer completeness99.98% ≥ 95%
BUSCO ** C:98.6%[S:95.7%,D:2.9%],
F:0.3%,M:1.0%,n:2,326
C ≥ 95%
Percentage of assembly mapped to
chromosomes
99.81% ≥ 95%
OrganellesMitochondrial genome: 505.23 and 155.85 kb
Plastid genome: 160.82 kb
complete single alleles
Genome annotation of assembly GCA_958979055.1 at Ensembl
Number of protein-coding genes23,728
Number of non-coding genes4,860
Number of gene transcripts36,650

* Assembly metric benchmarks are adapted from column VGP-2020 of “Table 1: Proposed standards and metrics for defining genome assembly quality” from Rhie et al. (2021).

** BUSCO scores based on the eudicots_odb10 BUSCO set using version 5.4.3. C = complete [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/dhAlnGlut1_1/dataset/dhAlnGlut1_1/busco.

Table 3.

Chromosomal pseudomolecules in the genome assembly of Alnus glutinosa, dhAlnGlut1.
INSDC accessionNameLength (Mb)GC%
OY340898.1153.3536.5
OY340899.1239.436.5
OY340900.1338.1137.0
OY340901.1435.7436.5
OY340902.1533.3136.5
OY340903.1630.5137.0
OY340904.1730.4936.5
OY340905.1830.1437.0
OY340906.1929.1436.5
OY340907.11028.9636.0
OY340908.11128.7636.5
OY340909.11227.2137.0
OY340910.11325.939.0
OY340911.11425.6836.5
OY340914.1Pltd0.1636.5
OY340912.1MT10.5145.0
OY340913.1MT20.1645.5

The estimated Quality Value (QV) of the final assembly is 52.4 with k-mer completeness of 99.98%, and the assembly has a BUSCO v5.4.3 completeness of 98.6% (single = 95.7%, duplicated = 2.9%), using the eudicots_odb10 reference set ( n = 2,326).

Metadata for specimens, BOLD barcode results, spectra estimates, sequencing runs, contaminants and pre-curation assembly statistics are given at https://links.tol.sanger.ac.uk/species/3517.

Genome annotation report

The Alnus glutinosa genome assembly (GCA_958979055.1) was annotated at the European Bioinformatics Institute (EBI) on Ensembl Rapid Release. The resulting annotation includes 36,650 transcribed mRNAs from 23,728 protein-coding and 4,860 non-coding genes ( Table 2; https://rapid.ensembl.org/Alnus_glutinosa_GCA_958979055.1/Info/Index). The average transcript length is 4,410.79. There are 1.28 coding transcripts per gene and 5.08 exons per transcript.

Methods

Sample acquisition, DNA barcoding and genome size estimation

A specimen of Alnus glutinosa (specimen ID KDTOL10041, ToLID dhAlnGlut1) was hand-picked from Canbury Gardens, Kingston upon Thames, Surrey, UK (latitude 51.42, longitude –0.31) on 2020-08-12. The specimen was collected and identified by Maarten Christenhusz (Royal Botanic Gardens, Kew, UK) and preserved by freezing at –80 °C. The herbarium voucher associated with the sequenced plant (K001400626) is from the collection number Christenhusz 9038 and is deposited in the herbarium of RBG Kew (K).

The specimen used for RNA sequencing (specimen ID EDTOL00069, ToLID dhAlnGlut2) was collected from Royal Botanic Garden Edinburgh (Inverleith), Midlothian, Scotland, UK (latitude 55.97, longitude –3.21) on 2020-08-12. The specimen was collected by Zoe Goodwin and David Bell and preserved by snap freezing in liquid nitrogen.

The initial species identification was verified by an additional DNA barcoding process according to the framework developed by Twyford et al. (2024). A small sample was dissected from each specimen and stored in silica ( Chase & Hills, 1991), while the remaining parts of the specimen were flash frozen at –80°C and shipped on dry ice to the Wellcome Sanger Institute (WSI). The silica-dried tissue was lysed, barcode region(s) amplified by PCR, and amplicons were sequenced and compared to a sequence database ( Crowley et al., 2023). Following whole genome sequence generation, the relevant DNA barcode region was also used alongside the initial barcoding data for sample tracking at the WSI ( Twyford et al., 2024). The standard operating procedures for Darwin Tree of Life barcoding have been deposited in protocols.io ( Beasley et al., 2023).

The genome size was estimated by flow cytometry using the fluorochrome propidium iodide and following the ‘one-step’ method as outlined in Pellicer et al. (2021). For this species, the General Purpose Buffer (GPB) supplemented with 3% PVP and 0.08% (v/v) beta-mercaptoethanol was used for isolation of nuclei ( Loureiro et al., 2007), and the internal calibration standard was Petroselinum crispum ‘Champion Moss Curled’ with an assumed 1C-value of 2,200 Mb ( Obermayer et al., 2002).

Nucleic acid extraction

The workflow for high molecular weight (HMW) DNA extraction at the WSI Tree of Life Core Laboratory includes a sequence of core procedures: sample preparation and homogenisation, DNA extraction, fragmentation and purification. Detailed protocols are available on protocols.io ( Denton et al., 2023). The dhAlnGlut1 sample was weighed and dissected on dry ice ( Jay et al., 2023), and leaf tissue was cryogenically disrupted using the Covaris cryoPREP ® Automated Dry Pulverizer ( Narváez-Gómez et al., 2023). HMW DNA was extracted using the Automated Plant MagAttract v1 protocol ( Sheerin et al., 2023). HMW DNA was sheared into an average fragment size of 12–20 kb in a Megaruptor 3 system ( Todorovic et al., 2023). Sheared DNA was purified by solid-phase reversible immobilisation, using AMPure PB beads to eliminate shorter fragments and concentrate the DNA ( Strickland et al., 2023). The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

RNA was extracted from leaf tissue of dhAlnGlut2 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol ( do Amaral et al., 2023). The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit. Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing

Table 1 gives the raw read accessions and read and base counts for each sequencing technology.

Legacy Chromium10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers’ instructions and sequencing was performed on the Illumina NovaSeq 6000 instrument.

Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers’ instructions. Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences Sequel IIe (HiFi) and Illumina HiSeq 4000 (RNA-Seq) instruments.

Hi-C data were generated from the leaf tissue of dhAlnGlut1, using the Arima-HiC v2 kit. In brief, frozen tissue (–80°C) was fixed, and the DNA crosslinked using a TC buffer containing formaldehyde. The crosslinked DNA was then digested using a restriction enzyme master mix. The 5’-overhangs were then filled in and labelled with a biotinylated nucleotide and proximally ligated. The biotinylated DNA construct was fragmented to a fragment size of 400 to 600 bp using a Covaris E220 sonicator. The DNA was then enriched, barcoded, and amplified using the NEBNext Ultra II DNA Library Prep Kit, following manufacturers’ instructions. The Hi-C sequencing was performed using paired-end sequencing with a read length of 150 bp on an Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation

Assembly

The HiFi reads were first assembled using Hifiasm ( Cheng et al., 2021) with the --primary option. One round of polishing was performed by aligning 10X Genomics read data to the assembly with Long Ranger ALIGN, calling variants with FreeBayes ( Garrison & Marth, 2012). Haplotypic duplications were identified and removed with purge_dups ( Guan et al., 2020). The assembly was then scaffolded with Hi-C data ( Rao et al., 2014) using SALSA2 ( Ghurye et al., 2019). The scaffolded assemblies were evaluated using Gfastats ( Formenti et al., 2022), BUSCO ( Manni et al., 2021) and MERQURY.FK ( Rhie et al., 2020). The organelle genomes were assembled using OATK ( Zhou, 2023).

Curation

The assembly was checked for contamination and corrected using the gEVAL system ( Chow et al., 2016) as described previously ( Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass ( Kerpedjiev et al., 2018) and Pretext ( Harry, 2022). Scaffolds were visually inspected and any identified contamination, missed joins, and mis-joins were corrected, and duplicate sequences were tagged and removed. The curation process is documented at https://gitlab.com/wtsi-grit/rapid-curation (article in preparation).

Evaluation of final assembly

A Hi-C map for the final assembly was produced using bwa-mem2 ( Vasimuddin et al., 2019) in the Cooler file format ( Abdennur & Mirny, 2020). To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury ( Rhie et al., 2020). This work was done using the “sanger-tol/readmapping” ( Surana et al., 2023a) and “sanger-tol/genomenote” ( Surana et al., 2023b) pipelines. The genome readmapping pipelines were developed using the nf-core tooling ( Ewels et al., 2020), use MultiQC ( Ewels et al., 2016), and make extensive use of the Conda package manager, the Bioconda initiative ( Grüning et al., 2018), the Biocontainers infrastructure ( da Veiga Leprevost et al., 2017), and the Docker ( Merkel, 2014) and Singularity ( Kurtzer et al., 2017) containerisation solutions. The genome was analysed within the BlobToolKit environment ( Challis et al., 2020) and BUSCO scores ( Manni et al., 2021) were calculated.

Table 4 contains a list of relevant software tool versions and sources.

Genome annotation

The Ensembl Genebuild annotation system ( Aken et al., 2016) was used to generate annotation for the Alnus glutinosa assembly (GCA_958979055.1) in Ensembl Rapid Release at the EBI. Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt ( UniProt Consortium, 2019).

Wellcome Sanger Institute – Legal and Governance

The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the ‘Darwin Tree of Life Project Sampling Code of Practice’, which can be found in full on the Darwin Tree of Life website here. By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible. The overarching areas of consideration are:

•     Ethical review of provenance and sourcing of the material

•     Legality of collection, transfer and use (national and international)

Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

Notes

[version 1; peer review: 2 approved]

Funding Statement

This work was supported by Wellcome through core funding to the Wellcome Sanger Institute [206194, <a href=https://doi.org/10.35802/206194>https://doi.org/10.35802/206194</a>] and the Darwin Tree of Life Discretionary Award [218328, <a href=https://doi.org/10.35802/218328>https://doi.org/10.35802/218328 </a>].

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability

European Nucleotide Archive: Alnus glutinosa. Accession number PRJEB46320; https://identifiers.org/ena.embl/PRJEB46320 ( Wellcome Sanger Institute, 2023). The genome sequence is released openly for reuse. The Alnus glutinosa genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Author information

Members of the Royal Botanic Gardens Kew Genome Acquisition Lab are listed here: https://doi.org/10.5281/zenodo.12625079.

Members of the Royal Botanic Garden Edinburgh Genome Acquisition Lab are listed here: https://doi.org/10.5281/zenodo.4786682.

Members of the Plant Genome Sizing collective are listed here: https://doi.org/10.5281/zenodo.7994306.

Members of the Darwin Tree of Life Barcoding collective are listed here: https://doi.org/10.5281/zenodo.12158331.

Members of the Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team are listed here: https://doi.org/10.5281/zenodo.12162482.

Members of Wellcome Sanger Institute Scientific Operations: Sequencing Operations are listed here: https://doi.org/10.5281/zenodo.12165051.

Members of the Wellcome Sanger Institute Tree of Life Core Informatics team are listed here: https://doi.org/10.5281/zenodo.12160324.

Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.12205391.

Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

References

  • Abdennur N, Mirny LA: Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36(1):311–316. 10.1093/bioinformatics/btz540 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Aken BL, Ayling S, Barrell D, et al. : The Ensembl gene annotation system. Database (Oxford). 2016;2016: baw093. 10.1093/database/baw093 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Beasley J, Uhl R, Forrest LL, et al. : DNA barcoding SOPs for the Darwin Tree of Life project. protocols.io. 2023; [Accessed 25 June 2024].. 10.17504/protocols.io.261ged91jv47/v1 [CrossRef] [Google Scholar]
  • Brasier CM, Kirk SA, Delcan J, et al. : Phytophthora alni sp. nov. and its variants: designation of emerging heteroploid hybrid pathogens spreading on Alnus trees. Mycol Res. 2004;108(Pt 10):1172–1184. 10.1017/s0953756204001005 [Abstract] [CrossRef] [Google Scholar]
  • Challis R, Richards E, Rajan J, et al. : BlobToolKit – interactive quality assessment of genome assemblies. G3 (Bethesda). 2020;10(4):1361–1374. 10.1534/g3.119.400908 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Chase MW, Hills HH: Silica gel: an ideal material for field preservation of leaf samples for DNA studies. Taxon. 1991;40(2):215–220. 10.2307/1222975 [CrossRef] [Google Scholar]
  • Cheng H, Concepcion GT, Feng X, et al. : Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–175. 10.1038/s41592-020-01056-5 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Chow W, Brugger K, Caccamo M, et al. : gEVAL  a web-based browser for evaluating genome assemblies. Bioinformatics. 2016;32(16):2508–2510. 10.1093/bioinformatics/btw159 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Crowley L, Allen H, Barnes I, et al. : A sampling strategy for genome sequencing the British terrestrial arthropod fauna [version 1; peer review: 2 approved]. Wellcome Open Res. 2023;8:123. 10.12688/wellcomeopenres.18925.1 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • da Veiga Leprevost F, Grüning BA, Alves Aflitos S, et al. : BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. 10.1093/bioinformatics/btx192 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • De Cleene M, Lejeune MC: Compendium van rituele planten in Europa.Gent: Uitgeverij Stichting Mens en Kultuur,2000. Reference Source [Google Scholar]
  • Denton A, Yatsenko H, Jay J, et al. : Sanger Tree of Life wet laboratory protocol collection V.1. protocols.io. 2023. 10.17504/protocols.io.8epv5xxy6g1b/v1 [CrossRef] [Google Scholar]
  • do Amaral RJV, Denton A, Yatsenko H, et al. : Sanger Tree of Life RNA extraction: automated MagMax TM mirVana. protocols.io. 2023. 10.17504/protocols.io.6qpvr36n3vmk/v1 [CrossRef] [Google Scholar]
  • Ewels P, Magnusson M, Lundin S, et al. : MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. 10.1093/bioinformatics/btw354 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Ewels PA, Peltzer A, Fillinger S, et al. : The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38(3):276–278. 10.1038/s41587-020-0439-x [Abstract] [CrossRef] [Google Scholar]
  • Formenti G, Abueg L, Brajuka A, et al. : Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics. 2022;38(17):4214–4216. 10.1093/bioinformatics/btac460 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Fuller E, Germaine KJ, Rathore DS: The good, the bad, and the useable microbes within the common alder ( Alnus glutinosa) microbiome—potential bio-agents to combat alder dieback. Microorganisms. 2023;11(9):2187. 10.3390/microorganisms11092187 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing.2012; [Accessed 26 July 2023]. 10.48550/arXiv.1207.3907 [CrossRef] [Google Scholar]
  • Ghurye J, Rhie A, Walenz BP, et al. : Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019;15(8): e1007273. 10.1371/journal.pcbi.1007273 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Grüning B, Dale R, Sjödin A, et al. : Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018;15(7):475–476. 10.1038/s41592-018-0046-7 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Guan D, McCarthy SA, Wood J, et al. : Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–2898. 10.1093/bioinformatics/btaa025 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Harry E: PretextView (Paired REad TEXTure Viewer): a desktop application for viewing pretext contact maps.2022. Reference Source
  • Howe K, Chow W, Collins J, et al. : Significantly improving the quality of genome assemblies through curation. GigaScience. 2021;10(1): giaa153. 10.1093/gigascience/giaa153 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Jay J, Yatsenko H, Narváez-Gómez JP, et al. : Sanger Tree of Life sample preparation: triage and dissection. protocols.io. 2023. 10.17504/protocols.io.x54v9prmqg3e/v1 [CrossRef] [Google Scholar]
  • Kerpedjiev P, Abdennur N, Lekschas F, et al. : HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018;19(1): 125. 10.1186/s13059-018-1486-1 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Kurtzer GM, Sochat V, Bauer MW: Singularity: scientific containers for mobility of compute. PLoS One. 2017;12(5): e0177459. 10.1371/journal.pone.0177459 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Loureiro J, Rodriguez E, Dolezel J, et al. : Two new nuclear isolation buffers for plant DNA flow cytometry: a test with 37 species. Ann Bot. 2007;100(4):875–888. 10.1093/aob/mcm152 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Manni M, Berkeley MR, Seppey M, et al. : BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–4654. 10.1093/molbev/msab199 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Merkel D: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014;2014(239): 2. [Accessed 2 April 2024]. Reference Source [Google Scholar]
  • Narváez-Gómez JP, Mbye H, Oatley G, et al. : Sanger Tree of Life sample homogenisation: covaris cryoPREP ® automated dry pulverizer V.1. protocols.io. 2023. 10.17504/protocols.io.eq2lyjp5qlx9/v1 [CrossRef] [Google Scholar]
  • Obermayer R, Leitch IJ, Hanson L, et al. : Nuclear DNA C-values in 30 species double the familial representation in pteridophytes. Ann Bot. 2002;90(2):209–217. 10.1093/aob/mcf167 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Paschke MW, Dawson JO, David MB: Soil nitrogen mineralization in plantations of Juglans nigra interplanted with actinorhizal Elaeagnus umbellata or Alnus glutinosa. Plant Soil. 1989;118:33–42. 10.1007/BF02232788 [CrossRef] [Google Scholar]
  • Pawlowski K, Newton WE, (eds.): Nitrogen-fixing actinorhizal symbioses. Heidelberg: Springer Science & Business Media,2010;6. [Google Scholar]
  • Pellicer J, Powell RF, Leitch IJ: The application of flow cytometry for estimating genome size, ploidy level endopolyploidy, and reproductive modes in plants.In: Besse, P. (ed.) Methods Mol Biol (Clifton, N.J.).New York, NY, Humana,2021;2222:325–361. 10.1007/978-1-0716-0997-2_17 [Abstract] [CrossRef] [Google Scholar]
  • POWO: Plants of the World Online.Royal Botanic Gardens, Kew,2024. Reference Source [Google Scholar]
  • Preston CD, Pearman DA, Trevor DD: New atlas of the British & Irish flora.Oxford: Oxford University Press,2002. Reference Source [Google Scholar]
  • Rao SSP, Huntley MH, Durand NC, et al. : A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–1680. 10.1016/j.cell.2014.11.021 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Rhie A, McCarthy SA, Fedrigo O, et al. : Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737–746. 10.1038/s41586-021-03451-0 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Rhie A, Walenz BP, Koren S, et al. : Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1): 245. 10.1186/s13059-020-02134-9 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Sheerin E, Sampaio F, Oatley G, et al. : Sanger Tree of Life HMW DNA extraction: automated plant MagAttract v.1. protocols.io. 2023; [Accessed 19 January 2024]. 10.17504/protocols.io.n2bvj3n1xlk5/v1 [CrossRef] [Google Scholar]
  • Stace CA, Thompson H, Stace M: New flora of the British Isles.4th ed. C&M Floristics,2019. [Google Scholar]
  • Strickland M, Cornwell C, Howard C: Sanger Tree of Life fragmented DNA clean up: manual SPRI. protocols.io. 2023. 10.17504/protocols.io.kxygx3y1dg8j/v1 [CrossRef] [Google Scholar]
  • Stroh PA, Walker KJ, Humphrey TA, et al. : Plant Atlas 2020. Mapping changes in the distribution of the British and Irish flora.Durham: Botanical Society of Britain and Ireland,2023;2. Reference Source [Google Scholar]
  • Surana P, Muffato M, Qi G: sanger-tol/readmapping: sanger-tol/readmapping v1.1.0 - Hebridean Black (1.1.0). Zenodo. 2023a. 10.5281/zenodo.7755669 [CrossRef] [Google Scholar]
  • Surana P, Muffato M, Sadasivan Baby C: sanger-tol/genomenote (v1.0.dev). Zenodo. 2023b. 10.5281/zenodo.6785935 [CrossRef] [Google Scholar]
  • Todorovic M, Sampaio F, Howard C: Sanger Tree of Life HMW DNA fragmentation: diagenode Megaruptor ®3 for PacBio HiFi. protocols.io. 2023. 10.17504/protocols.io.8epv5x2zjg1b/v1 [CrossRef] [Google Scholar]
  • Twyford AD, Beasley J, Barnes I, et al. : A DNA barcoding framework for taxonomic verification in the Darwin Tree of Life project [version 1; peer review: 1 approved]. Wellcome Open Res. 2024;9:339. 10.12688/wellcomeopenres.21143.1 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 10.1093/nar/gky1049 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
  • Vasimuddin M, Misra S, Li H, et al. : Efficient architecture-aware acceleration of BWA-MEM for multicore systems.In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE,2019;314– 324. 10.1109/IPDPS.2019.00041 [CrossRef] [Google Scholar]
  • Wellcome Sanger Institute: The genome sequence of the common alder, Alnus glutinosa (L.) Gaertn. European Nucleotide Archive.[dataset], accession number PRJEB46320,2023.
  • Zhou C: c-zhou/oatk: Oatk-0.1.2023. 10.5281/zenodo.7631375 [CrossRef] [Google Scholar]
2024; 9: 570.
Published online 2024 Oct 7. 10.21956/wellcomeopenres.25477.r108087

Reviewer response for version 1

The authors present a haploid, nonphased reference genome and gene annotation for Alnus glutinosa. It is assembled and annotated using modern methods and appears to be very complete based on kmers, BUSCOs and N50/N90 stats. The only concern I see is that flow cytometry indicates the genome should be 660Mb but the assembly is only 456Mb. Author speculation on this difference would be a nice addition.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Plant Computational Genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

2024; 9: 570.
Published online 2024 Oct 7. 10.21956/wellcomeopenres.25477.r108089

Reviewer response for version 1

1. The authors can use other assembly metrics like LAI or GCI to assess the quality of their finished assembly.

2. HiC contact map may be partitioned along with the chromosomes for easy visualization.

3. The genes might be subjected with functional annotation so that these can be used to derive out the biological meaning.  Say, for example the genes associated in the symbiotic nitrogen-fixation using an actinomycete Frankia nodulation mechanism.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Transcriptomics, Genomics, molecular biology, gene cloning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.


Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

Funding 


Funders who supported this work.

Wellcome Trust (2)