Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


Astragalus membranaceus (Fisch.) Bge (AM) is a medicinal herb plant belonging to the Leguminosae family. In this study, we present a chromosome-scale genome assembly of AM, aiming to enhance the molecular biology and functional studies of Astragali Radix. The genome size of AM is about 1.43 Gb, with a contig N50 value of 1.67 Mb. A total of 98.16% of the assembly anchored to 9 pseudochromosomes using Hi-C technology. The assembly completeness was estimated to be 97.27% using BUSCO with the long terminal repeat assembly index (LAI) of 16.22 and quality value (QV) of 48.58. Additionally, the genome contained 67.98% repetitive sequences. Genome annotation predicted 29,914 protein-coding genes, including 73 genes involved in the flavonoid biosynthetic pathway and 2,048 transcription factors. The high-quality genome assembly and gene annotation resources will greatly facilitate future functional genomic studies in Leguminosae species.

Free full text 


Logo of sdataLink to Publisher's site
Sci Data. 2024; 11: 1071.
Published online 2024 Oct 2. https://doi.org/10.1038/s41597-024-03852-6
PMCID: PMC11446949
PMID: 39358417

Chromosome-scale genome assembly of Astragalus membranaceus using PacBio and Hi-C technologies

Associated Data

Data Citations
Supplementary Materials

Abstract

Astragalus membranaceus (Fisch.) Bge (AM) is a medicinal herb plant belonging to the Leguminosae family. In this study, we present a chromosome-scale genome assembly of AM, aiming to enhance the molecular biology and functional studies of Astragali Radix. The genome size of AM is about 1.43 Gb, with a contig N50 value of 1.67 Mb. A total of 98.16% of the assembly anchored to 9 pseudochromosomes using Hi-C technology. The assembly completeness was estimated to be 97.27% using BUSCO with the long terminal repeat assembly index (LAI) of 16.22 and quality value (QV) of 48.58. Additionally, the genome contained 67.98% repetitive sequences. Genome annotation predicted 29,914 protein-coding genes, including 73 genes involved in the flavonoid biosynthetic pathway and 2,048 transcription factors. The high-quality genome assembly and gene annotation resources will greatly facilitate future functional genomic studies in Leguminosae species.

Subject terms: Plant genetics, Genomics

Background & Summary

Astragalus membranaceus (Fisch.) Bge (AM) is a widely used medicinal plants worldwide1. Its dried roots are known as Astragali Radix possessing hepatoprotective, diuretic, tonic and expectorant activities and play roles in anti-aging, anti-tumor, anti-neurodegeneration, and regulating blood glucose and immunity in Chinese medicine2. Flavonoids are one of the main active compounds in AM. Flavonoids have diverse biological activities and play numerous roles in the interaction between plants and the environment, such as resisting diseases and insect pests, preventing ultraviolet burns, attracting insects to pollinate, etc3. Recently, the genome of Astragalus mongholicus (AMM), another authorized plant source of Astragali Radix, has been reported4,5. It’s widely believed that the morphology and function of AM and AMM are highly divergent, and the latter species was more heterozygous. Based on metabolomics in the present study6, a total of 53 chemical markers was identified for the discrimination of AMM and AM. Among them, the contents of 36 components including 14 flavonoids in AM were significantly higher than those in AMM. AM may own stronger pharmacological activities than AMM.

To further understand the underlying molecular mechanism of flavonoid biosynthesis, we performed a chromosome-level genome sequencing of AM (2n = 18) using a combined PacBio reads and Hi-C scaffolding technology (Fig. 1). The assembled AM genome had a total length of 1.43 Gb, with a contig N50 of 1.67 Mb and a complete BUSCO score of 97.27%. A total of 1.40 Gb (98.16%) of the sequences was anchored to the 9 pseudochromosomes (Fig. 2). Genome annotation predicted 29,914 protein-coding genes and 972.44 Mb (67.98%) repetitive sequences. Moreover, 73 genes associated with the flavonoid biosynthetic pathway (Fig. 3) and 2,048 transcription factors (TFs) have been identified. The chromosome-scale genome of AM provides a genetic basis for exploring key genes and molecular regulatory mechanisms involved in the biosynthesis of important compounds, while also serves as a valuable resource for comparative genomic analysis between AM and AMM.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig1_HTML.jpg

Circos plot illustrating the genome of AM genome. The plot includes the following components, arranged from inside to outside: (I) Collinear regions within AM assembly; (II) GC content in non-overlapping 1 Mb windows; (III) Percentage of repeats in 1-Mb sliding windows; (IV) Gene density in 1-Mb sliding windows; (V) Length of pseudo-chromosome in megabases (Mb).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig2_HTML.jpg

Comparative genomic analysis between AM and AMM. (a) The syntenic regions. The analysis reveals intricate relationships between AM and AMM in their genomes. (b) AM protein length plotted against the orthologous protein length for AMM. (c) The density plot of SNPs between AM assembly and AMM assembly. (d) The density plot of Indels between AM assembly and AMM assembly.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig3_HTML.jpg

The genes involved in the biosynthesis of flavonoids and the TFs in the AM genome. (a) The phylogenetic tree of genes involved in the flavonoid biosynthetic pathway. Genes with IDs highlighted in gold represent those originating from AM, while those highlighted in blue denote genes from AMM, and those in red denote genes from M. truncatula. (b) The distribution of TF family in AM genome. Only TF family containing 10 or more genes are shown.

Methods

Plant materials and sequencing

The plant material used for de novo genome assembly was a seven-year-old AM plant grown in Jinzhong, China. After the collection of vigorously growing leaves, they were immediately snap-frozen in liquid nitrogen. The frozen leaves were then stored at −80 °C in the laboratory until DNA extraction could be performed. Genomic DNA was extracted using DNeasy Plant Maxi kit (Qiagen, German). A short-fragmented library was prepared with an insert size of 350 bp and sequenced using BGISEQ, resulting in 150 bp paired-end reads. Two libraries were prepared following the manufacturer’s instructions from Pacific Biosciences, with an insert size of approximately 20 kb. These libraries were sequenced using PacBio Sequel platforms to generate continuous long reads. For chromosomal conformational capture (Hi-C) sequencing, libraries generated using DpnII restriction enzymes were prepared according to previously described methods7, and subsequently sequenced on the BGISEQ platform. RNA-seq libraries from root, leaf, and stem tissues during the fruit growth period were constructed using the NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, Ipswich, MA, USA) following the manufacturer’s protocol8. Then cDNA libraries were sequenced using a BGISEQ instrument, yielding 150 bp paired-end reads.

In summary, 156.2 Gb of paired-end next-generation sequencing reads (~109.2X), 196.4 Gb of PacBio subreads (~196.4X; the N50 length of subreads was larger than 22 kb), and 285.6 Gb of Hi-C data (~199.7X) were obtained (Table 1).

Table 1

Summary of sequencing data of AM genome.

Reads NumberTotal length (Gb)Genome depthN50 length of reads (bp)
NGS raw reads1,041,321,120156.2109.2150
NGS clean reads1,032,333,412154.6108.1150
PacBio subreadscell 19,476,555143.6100.422,132
cell 29,504,443137.295.922,078
Hi-C raw data1,842,342,896285.6199.7150
Hi-C clean data1,801,451,914271.2189.7150

Genome survey

K-mer frequency distribution is a prevalent genomic survey technique. A K-mer is a sequence of K nucleotides extracted from sequencing data. With a read length of L, this method generates L-K + 1 K-mers. The 17-mer is a common choice for genome size estimation due to its capacity to cover a vast number of combinations (4^17), suitable for various species such as willow (338.93 MB)9, Dalbergia odorifera (653.45 Mb)10, camel (2.01–2.05 Gb)11, and gecko (2.55 Gb)12. Here, we counted 17-bp K-mers using Jellyfish (v 2.2.10)13 and estimated genome characteristics with GenomeScope (v2.0)14. The estimated genome size was 1.43 Gb with a heterozygosity rate of 1.01% (Table 2). This assessment closely matches the results obtained via flow cytometry, which indicated a genome size of 1.52 Gb15.

Table 2

Estimation of genome and repeat fragment size, and heterozygosity of AM.

KK-mer NumberGenome Size (Gb)Repeat (%)Heterozygous Ratio (%)Used Bases (bp)Depth (X)
17137,801,908,9231.4382.11.01154,569,325,184108.09

Genome assembly

Based on PacBio CLR data, Canu16, FALCON17, and MECAT218 have become widely used software in the field of genome assembly. Research by Nie et al. in 202419 demonstrated the high accuracy of these software packages in genome assembly. Notably, FALCON, endorsed by PacBio, has played a pivotal role in numerous high-quality plant genome projects. For instance, FALCON was utilized in the barley genome20, the maize Mo17 projetc21, the Asian rice genome research22, and the coffee genome study23, showcasing its effectiveness in facilitating efficient genome assembly. Here, the contig of the AM genome was assembled using Falcon (v2.0.5) assembler, with parameters as follows: -v -B48 -D250 -M24 -h600 -e.75 -l3000 -s1000 -k18 -w6 -T8–output_multi–min_idt 0.75–min_cov 4–max_n_read 200–n_core 8. After the Falcon assembly, the genome was polished by the command-line SMRT Link (v4.0.0) following the Reference Guide (https://programs.pacificbiosciences.com/l/1652/2017-02-01/3rzxn6/184345/SMRT_Tools_Reference_Guide__v4.0.0_.pdf). To enhance the contiguity of the genome and reduce errors, NGS short reads were through Pilon (v1.22)24. Finally, TrimDup, a component of the Rabbit Genome Assembler (https://github.com/gigascience/rabbit-genome-assembler), was applied to eliminate redundant sequences using a percentage of 0.3.

To anchor contigs onto pseudochromosomes, we used BWA (v 0.7.12)25 to align the Hi-C clean data to the assembled contigs. Low-quality reads were filtered out using the HiC-Pro pipeline26 with default parameters. The remaining valid reads were employed to anchor chromosomes with Juicer27 and 3d-dna pipeline28. Finally, the chromosome assemblies were cut into 500 kb bins of equal lengths and the interaction signals generated by the valid mapped read pairs between each bin were visualized in a heat map.

A genome assembly spanning 1.43 Gb was generated (Fig. 1; Table 3), which was close to the genome size of AMM (1.43 Gb vs 1.47 Gb) and the estimated genome size. The contig N50 value of AM genome was 1.67 Mb, which is comparable to the recently published genome of the closely related legume Astragalus sinicus29 (1.67 Mb vs 1.5 Mb). Approximately 1.40 Gb (98.16%) of the sequences were successfully anchored to the 9 pseudochromosomes (Table 3).

Table 3

Characteristics of the genome assembly in Astragalus membranaceus.

ItemAstragalus membranaceus (AM)
Size of assembly (Gb)1.43
Contig N50 (Mb)1.67
Chromosome number9
Anchored pseudo-chromosomes98.16%
GC content38.40%
Genome complete BUSCOs97.27%
Long terminal repeat assembly index16.22
Quality value48.58
Repetitive sequences67.98%
Number of protein-coding genes29,914

Annotation of repetitive sequences

Tandem repeats and interspersed repeats were identified using the method described in Qu et al.30. Approximately 67.98% of the assembled genome was classified as repetitive sequences, with interspersed repeats making up 65.99% of them (Table 4). Among the repetitive sequences, the most prevalent elements were long terminal repeats (LTRs), which accounted for 60.66% of the genome size.

Table 4

Statistics of interspersed repeats in AM assembly.

TypeRepbase TEsTE protiensDe novo*Combined TEs
Length (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genome
DNA31,435,9582.2016,519,8681.1550,391,3043.5272,929,6735.10
LINE10,593,0420.748,014,8440.569,810,8450.6922,455,9841.57
SINE129,1430.0100.00208,5370.01336,6320.02
LTR233,807,30816.34232,969,05216.29840,527,90258.76867,737,64660.66
Other3,8750.0000.0015,6610.0019,5360.00
†Unclassified00.0000.002,177,6240.152,177,6240.15
Total274,854,67719.21257,495,85218.00897,005,12262.71943,956,99565.99

Note: This statistical table does not contain Tandem Repeats, some elements may partly include another element domain.

*Combined: the non-redundant consensus of all repeat prediction/classification methods employed.

†Unclassified: the predicted repeats that cannot be classified by RepeatMasker;

LINE, long interspersed nuclear elements; SINE, short interspersed nuclear elements; LTR, long terminal repeat.

Protein-coding genes prediction and functional annotation

Protein-coding genes were annotated using a similar method as described in Fang et al.31. To facilitate genome annotation of AM assembly, RNA sequencing of root, stem, and leaf samples was conducted and resulted in a total of 72.18 Gb clean reads (Table 5). For transcriptome-based prediction, RNA-seq clean reads were assembled using Trinity (v 2.15.1)32 with the following parameters: ‘–max_memory 200 G–CPU 40–min_contig_length 200–genome_guided_bam merged_sorted.bam–full_cleanup–min_kmer_cov 3–min_glue 3–bfly_opts ‘-V 5–edge-thr = 0.1–stderr’–genome_guided_max_intron 10000–genome_guided_min_coverage 2’. This generated 245,216 transcripts with an N50 of 1,997 bp. The assembled transcripts were aligned to the AM assembly using Program to Assemble Spliced Alignment (PASA) (v 2.4.1)33, and gene structures were generated from valid transcript alignments. Additionally, RNA-seq clean reads were also mapped to the AM assembly using Hisat2 (v 2.0.1)34. Stringtie (v 1.2.2)35 and TransDecoder (v 5.7.1) (https://github.com/TransDecoder/TransDecoder) were employed to assemble the transcripts and identify candidate coding regions into gene models. For homology-based method, homologous genomes and gene sets, including A. membranaceus var. mongholicus (AMM)5, Cicer arietinum (GenBank accession: GCA_026016865.1)36, Medicago truncatula (GenBank accession: GCA_000219495.2)37, Trifolium pratense (GenBank accession: GCA_949352195.3)38, Glycine max (ZH13-T2T)39, and Arabidopsis thaliana (Col-PEK1.5)40, were downloaded and used as queries to search against the AM assembly utilizing GeMoMa (v 1.9)41 approach. Genes with a coding sequence (CDS) length less than 150 bp were filtered out, along with single-exon genes lacking annotation of protein domains. Additionally, genes not anchored to chromosome sequences and lacking annotation of protein domains were also excluded. Finally, the generated gene models were refined with PASA (v 2.4.1) to obtain untranslated regions and information on alternative splicing variation by using Trinity assembled transcripts and isoforms from full-length transcriptomes of leaf and root tissues42. Following the method described in Bi et al.43, the integrated gene set was translated into amino-acid sequences and annotated. As a result, 29,828 genes (99.71% of the total) were successfully annotated.

Table 5

Summary of RNAseq sequencing data of AM genome.

SampleTotal CleanReadsClean dataClean Q20% (fq1;fq2)GC_rate (%)Uniquely mapped readsTotal MappingRatioUniquely MappingRatioSRA accession
leaf_142,134,0946,320,114,10095.71;94.6744.0428,298,79271.76%67.16%SRR27790544
leaf_242,618,5426,392,781,30095.65;94.6743.8331,733,37479.87%74.46%SRR27790543
leaf_342,362,1986,354,329,70095.68;94.8343.1933,032,87283.56%77.98%SRR27790542
root_142,365,3486,354,802,20095.62;94.5142.6136,156,53891.68%85.34%SRR27790541
root_242,176,8526,326,527,80095.61;94.7342.5935,176,69090.26%83.40%SRR27790540
root_342,326,3206,348,948,00095.15;94.3142.8934,986,92888.98%82.66%SRR27790539
stem_142,113,0026,316,950,30095.63;94.5442.5736,319,66691.70%86.24%SRR27790538
stem_242,255,4166,338,312,40095.66;94.7142.5735,921,77290.41%85.01%SRR27790547
stem_342,317,4566,347,618,40095.66;94.2643.1934,638,54287.17%81.85%SRR27790546

Overall, we predicted 29,914 protein-coding genes, with average lengths of 4,752 bp for genes, 622 bp for introns, and 1,306 bp for coding sequences. We downloaded the genes related to the flavonoid biosynthetic pathway in the AMM genome and identified genes associated with the flavonoid biosynthetic pathway in the AM genome using the OrthoFinder method44. OrthoFinder is an accurate and comprehensive tool used for identifying and comparing homologous genomics among biological species. As a result, 73 genes associated with the flavonoid biosynthetic pathway in the AM genome were obtained. Homologous sequences were aligned by MAFFT (v 7.505)45, and the alignment was then processed with TrimAL (v 1.4.1)46 to remove poorly aligned positions. Subsequently, the phylogenetic tree was generated using iqtree2 (v 2.0.6)47 with parameters of “-b 1000” and visualized using Evolview48 (Fig. 3a). Using the method described in Li et al.49, a total of 2,048 transcription factors (TFs) were identified (Fig. 3b). In brief, the plant TF domain profile (https://planttfdb.gao-lab.org/)50 was searched against the AM protein data using the hmmsearch tool implemented in HMMER (v 3.1b2) (http://hmmer.org/). Proteins exhibiting a TF domain match with an E-value of 1E-5 or lower were chosen.

Genomic variations between AM and AMM

By applying the analytical tool MCScan (Python version)51, we conducted an in-depth identification of homologous regions between the AM and AMM genomes, with a threshold set to include at least ten genes. Our research findings revealed a total of 22,160 pairs of orthologous genes shared between the two genomes, with the AM genome containing 21,727 pairs (accounting for 72.63% of the total), and the AMM genome containing 21,474 pairs (accounting for 77.06% of the total) (Fig. 2a). The amino acid sequence lengths of the orthologous gene pairs within these collinear regions showed a significant positive correlation (Fig. 2b), further confirming their homology. Additionally, we observed two potential chromosomal rearrangement events. Firstly, a chromosomal fusion event occurred in the AMM linkage group Chr7, which connected the two chromosomes from the AM genome, namely Chr7 and Chr8. Secondly, a chromosomal fusion also took place in the AM linkage group Chr9, involving the two chromosomes from the AMM genome, Chr8 and Chr9. The specific causes of these fusion events, their timing, and how they affect the traits of the organism are important issues that require further in-depth exploration in future research.

Single nucleotide polymorphisms (SNPs) and small insertions/deletions (InDels) were identified using a similar method as previously reported52. Briefly, genome alignment between the AM and AMM assemblies was performed with the NUCmer program of MUMmer4 (v4.0.0)53 using the parameter settings “–mum -g 1000 -c 90 -l 40”. The delta-filter program was used to obtain alignment blocks with the parameter setting “-1 -l 5000”. The show-snps program was used to detect SNPs and InDels with the settings “-Clr -x 1 -T”. Finally, a total of 4,902,056 SNPs and 903,918 InDels were identified (Fig. 2c,d). These variations serve as resources for further research.

Data Records

The DNA and RNA sequence reads of AM have been deposited in the Sequence Read Archive (SRA) with accession numbers SRP48693054 under project number PRJNA1067739. The genome assembly has been deposited at GenBank under the WGS accession GCA_039519185.155. Additionally, the genome assembly, along with files for gene structure annotation, repeat predictions and gene functional annotation, variation information including SNP and InDels between AM and AMM genomes were deposited in Figshare56.

Technical Validation

Genome assembly and gene prediction quality assessment

The quality and accuracy of the AM assembly were assessed through the following analyses. Firstly, the Hi-C interaction map showed a strong intrachromosomal interactive signal along the diagonal (Fig. 4). Secondly, the distribution of CG depth indicated that there was no apparent contamination in the assembled sequences (Fig. 5). Thirdly, the AM assembly presented an LTR assembly index of 16.22 and a BUSCO score of 97.27%, indicating its high completeness (Table 3). In addition, evaluation using Merqury showed a QV of 48.58, suggesting high accuracy at the base-pair level. Lastly, 99.56% of the DNA next-generation sequencing reads were mapped to the AM genome assembly, whereas an equally impressive 99.25% of the error-corrected PacBio data could also be mapped to the assembly. Notably, the genome coverage achieved from the error-corrected PacBio data reached 99.40%, and the depth of each window remained consistent without significant fluctuations (Fig. 6).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig4_HTML.jpg

Hi-C assembly of chromosome interactive heat map. The abscissa and ordinate represent the order of each bin on the corresponding chromosome group. The color block illuminates the intensity of interaction from yellow (low) to red (high).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig5_HTML.jpg

The relationship between GC content and sequencing depths base on the alignment of PacBio data.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig6_HTML.jpg

Depth of HiFi long reads mapped across the 9 chromosomes of AM genome.

We compared the length distribution of genes among the AMM5, C. arietinum36, and G. max39, and found similar patterns (Fig. 7). Meanwhile, 85.39% of the RNA-seq data were aligned to the predicted exons and only 2.5% located in intergenic region (Fig. 8). The BUSCO analysis showed that 96.59% (single-copy gene: 88.97%, duplicated gene: 7.62%) of 1,614 embryophyta single-copy orthologs were successfully identified as complete, while 1.12% were fragmented and 2.29% were missing in the assembly (Table 6). The 29,828 (99.71%) gene models were successfully annotated in diverse databases, such as NR, SwissProt, KEGG, KOG, TrEMBL and Interpro (Table 7). Taken together, all these results provide strong evidence that a high-quality AM genome has been obtained.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig7_HTML.jpg

The composition of gene elements in the AM genome compared to the genomes of other species.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3852_Fig8_HTML.jpg

 RNA-seq clean data verified the accuracy of protein-coding gene prediction.

Table 6

Integrity assessment of predicted coding genes in AM assembly.

Database:embryophyta_odb10Number of BUSCOsPercentage
Complete (C)1,55996.59%
Complete and single-copy (S)1,43688.97%
Complete and duplicated (D)1237.62%
Fragmented (F)181.12%
Missing (M)372.29%
Total BUSCO groups searched1,6141

Table 7

Number of functional annotations for predicted genes in AM assembly.

TypeGene numberPercentage
Total29,914100%
NR29,60698.97%
SwissProt24,17980.83%
KEGG24,10380.57%
KOG23,58178.83%
TrEMBL29,59498.93%
InterproAll29,60798.97%
GO19,06963.75%
Annotated29,82899.71%
Unannotated860.29%

Supplementary information

Acknowledgements

This work was supported by Project of the “Modernization Research of Traditional Chinese Medicine” Key Research and Development Program of the Ministry of Science and Technology (No. 2019YFC1710800), Project of the Shanxi Collaborative Innovation Center of Astragali Radix Resource Industrialization and Industrial Internationalization (No. HQXTCXZX2016-005 and No. HQXTCXZX2016-016) and Key Research and Development (R&D) project of Shanxi Province (No.201603D3111001).

Author contributions

H.J.F., Z.C., R.Z., C.G.M. and Q.S.L. conceived the study. H.J.F. collected and prepared the samples. Z.Y.W. and X.K.Y. performed bioinformatics analysis. Z.C. and H.J.F. wrote the manuscript with significant contributions from X.K.Y., A.K.L. and H.F.S. All authors read and approved the final manuscript.

Code availability

No specific code or script was used in this work. Commands used for data processing were all executed according to the manuals and protocols of the corresponding software.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Huijie Fan, Zhi Chai, Xukui Yang.

Contributor Information

Huijie Fan, nc.ude.mctxs@eijiuhnaf.

Zhi Chai, nc.ude.mctxs@ihziahc.

Qingshan Li, moc.361@2102sqlxs.

Cungen Ma, nc.ude.mctxs@negnucam.

Ran Zhou, moc.uhos@85ruohz.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-024-03852-6.

References

1. Fu, J. et al. Review of the botanical characteristics, phytochemistry, and pharmacology of Astragalus membranaceus (Huangqi). Phytotherapy research: PTR28, 1275–1283 (2014). [Abstract] [Google Scholar]
2. Zheng, Y. et al. A Review of the Pharmacological Action of Astragalus Polysaccharide. Frontiers in pharmacology.11, 349 (2020). [Europe PMC free article] [Abstract] [Google Scholar]
3. Chen, J. et al. Global transcriptome analysis profiles metabolic pathways in traditional herb Astragalus membranaceus Bge. var. mongolicus (Bge.) Hsiao. BMC genomics16, 1–20 (2015). [Europe PMC free article] [Abstract] [Google Scholar]
4. Chen, Y. et al. A reference-grade genome assembly for Astragalus mongholicus and insights into the biosynthesis and high accumulation of triterpenoids and flavonoids in its roots. Plant Communications 4 (2022). [Europe PMC free article] [Abstract]
5. Global Pharmacopoeia Genome Databasehttp://www.gpgenome.com/species/109 (2022).
6. Wang, Y. et al. Chemical Discrimination of Astragalus mongholicus and Astragalus membranaceus Based on Metabolomics Using UHPLC-ESI-Q-TOF-MS/MS Approach. Molecules (Basel, Switzerland)24, E4064 (2019). [Europe PMC free article] [Abstract] [Google Scholar]
7. Belton, J.-M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods58(3), 268–76 (2012). [Europe PMC free article] [Abstract] [Google Scholar]
8. Bian, X. et al. Regulatory role of non-coding RNA in ginseng rusty root symptom tissue. Scientific reports11, 9211 (2021). [Europe PMC free article] [Abstract] [Google Scholar]
9. He, X. et al. The whole-genome assembly of an endangered Salicaceae species: Chosenia arbutifolia (Pall.) A. Skv. GigaScience 11 (2022). [Europe PMC free article] [Abstract]
10. Hong, Z. et al. The chromosome-level draft genome of Dalbergia odorifera. Gigascience 9.8 (2020). [Europe PMC free article] [Abstract]
11. Wu, H. et al. Camelid genomes reveal evolution and adaptation to desert environments. Nature communications 5.1 (2014). [Abstract]
12. Liu, Y. et al. Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration. Nature communications 6.1 (2015). [Europe PMC free article] [Abstract]
13. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). [Europe PMC free article] [Abstract] [Google Scholar]
14. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature communications11, 1432 (2020). [Europe PMC free article] [Abstract] [Google Scholar]
15. Fan, H. J. et al. Study of Genome Size of Medicinal Plant Astragali Radix. Chinese Journal of Basic Medicine In Traditional, 25(09), 1299–1302. (in Chinese with English abstract) (2019).
16. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research27.5, 722–736 (2017). [Europe PMC free article] [Abstract] [Google Scholar]
17. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nature Methods12, 780–786 (2015). [Europe PMC free article] [Abstract] [Google Scholar]
18. Xiao, C. L. et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nature Methods14.11, 1072–1074 (2017). [Abstract] [Google Scholar]
19. Nie, F. et al. De novo diploid genome assembly using long noisy reads. Nature Communications15(1), 2964 (2024). [Europe PMC free article] [Abstract] [Google Scholar]
20. Zeng, X. et al. An improved high-quality genome assembly and annotation of Tibetan hulless barley. Scientific Data7(1), 139 (2020). [Europe PMC free article] [Abstract] [Google Scholar]
21. Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nature Genetics55.2, 312–323 (2023). [Abstract] [Google Scholar]
22. Zhou, Y. et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun14, 1567 (2023). [Europe PMC free article] [Abstract] [Google Scholar]
23. Salojärvi, J. et al. The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars. Nat Genet56, 721–731 (2024). [Europe PMC free article] [Abstract] [Google Scholar]
24. Walker, B. J., Abeel, T., Shea, T., Priest, M. & Earl, A. M. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE9, e112963 (2014). [Europe PMC free article] [Abstract] [Google Scholar]
25. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics25, 1754–1760 (2009). [Europe PMC free article] [Abstract] [Google Scholar]
26. Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16 (2015). [Europe PMC free article] [Abstract]
27. Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems3, 95–98 (2016). [Europe PMC free article] [Abstract] [Google Scholar]
28. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, eaal3327 (2017). [Europe PMC free article] [Abstract] [Google Scholar]
29. Chang, D. et al. The chromosome-level genome assembly of Astragalus sinicus and comparative genomic analyses provide new resources and insights for understanding legume-rhizobial interactions. Plant communications3, 100263 (2022). [Europe PMC free article] [Abstract] [Google Scholar]
30. Qu, C. et al. Comparative genomic analyses reveal the genetic basis of the yellow-seed trait in Brassica napus. Nature Communications14, 5194 (2023). [Europe PMC free article] [Abstract] [Google Scholar]
31. Fang, X. et al. The sequence and analysis of a Chinese pig genome. GigaScience1, 16–16 (2012). [Europe PMC free article] [Abstract] [Google Scholar]
32. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology29(7), 644–52 (2011). [Europe PMC free article] [Abstract] [Google Scholar]
33. Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research31, 5654–5666 (2003). [Europe PMC free article] [Abstract] [Google Scholar]
34. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods12, 357–360 (2015). [Europe PMC free article] [Abstract] [Google Scholar]
35. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology20, 278 (2019). [Europe PMC free article] [Abstract] [Google Scholar]
39. National Genomics Data Centerhttps://ngdc.cncb.ac.cn/gwh/Assembly/66216/show (2023).
41. Keilwagen, J., Hartung, F., Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. In:Kollmar, M. (eds)Gene Prediction. Methods in Molecular Biology, vol 1962 (2019). [Abstract]
42. Li, J. et al. Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis. Cell Discovery 3 (2017). [Europe PMC free article] [Abstract]
43. Bi, Q. et al. The phased chromosome-scale genome of yellowhorn sheds light on the mechanism of petal color change. Horticultural Plant Journal (2023).
44. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. in Genome biology20, 238 (2019). [Europe PMC free article] [Abstract] [Google Scholar]
45. Nakamura, T., Yamada, K. D., Tomii, K. & Katoh, K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics (Oxford, England)34, 2490–2492 (2018). [Europe PMC free article] [Abstract] [Google Scholar]
46. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England)25, 1972–1973 (2009). [Europe PMC free article] [Abstract] [Google Scholar]
47. Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution37, 1530–1534 (2019). [Europe PMC free article] [Abstract] [Google Scholar]
48. Subramanian, B., Gao, S., Lercher, M. J., Hu, S. & Chen, W.-H. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic acids research47, W270–W275 (2019). [Europe PMC free article] [Abstract] [Google Scholar]
49. Li, D. et al. A high-quality genome assembly of the eggplant provides insights into the molecular basis of disease resistance and chlorogenic acid synthesis. Molecular ecology resources21, 1274–1286 (2021). [Abstract] [Google Scholar]
50. Jin, J., Zhang, H., Kong, L., Gao, G. & Luo, J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic acids research42, D1182–7 (2014). [Europe PMC free article] [Abstract] [Google Scholar]
51. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research40, e49 (2012). [Europe PMC free article] [Abstract] [Google Scholar]
52. Li, T. et al. Genome assembly of KA105, a new resource for maize molecular breeding and genomic research. The Crop Journal (2023).
53. Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology 14 (2018). [Europe PMC free article] [Abstract]
54. NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486930 (2024).
55. Fan, H. Astragalus membranaceus isolate JZ-2020, whole genome shotgun sequencing project. GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_039519185.1 (2024).
56. Fan, H. Genome Assembly and Annotation of Astragalus membranaceus (Fisch.) Bge (AM). figshare. Dataset.10.6084/m9.figshare.25100393.v3 (2024).

Articles from Scientific Data are provided here courtesy of Nature Publishing Group

Citations & impact 


This article has not been cited yet.

Impact metrics

Alternative metrics

Altmetric item for https://www.altmetric.com/details/169186708
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/169186708

Data 


Data behind the article

This data has been text mined from the article, or deposited into data resources.

Similar Articles 


To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.