First whole genome sequencing data of a <i>Mycobacterium tuberculosis</i> STB-T1A strain isolated from a spinal tuberculosis patient in Sabah, Malaysia.

Chin KL; Suing EJ; Andong R; Foo CH; Chan SK; Jani J; Ahmed K; Mustapha ZA

doi:10.1016/j.dib.2024.110476

First whole genome sequencing data of a Mycobacterium tuberculosis STB-T1A strain isolated from a spinal tuberculosis patient in Sabah, Malaysia.

Affiliations

1. Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.
Authors
Chin KL^{1,

3}
Suing EJ¹
Andong R¹
(3 authors)
2. Department of Orthopaedics, Queen Elizabeth Hospital, Ministry of Health Malaysia, Kota Kinabalu, Sabah, Malaysia.
Authors
Foo CH²
Chan SK²
(2 authors)
3. Borneo Medical and Health Research Centre, Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.
Authors
Chin KL^{1,

3}
Jani J³
Ahmed K³
(3 authors)
4. Department of Medical Education, Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.
Authors
Mustapha ZA⁴
(1 author)

Data in Brief, 25 Apr 2024, 54:110476
https://doi.org/10.1016/j.dib.2024.110476 PMID: 38725551 PMCID: PMC11079456

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

Spinal tuberculosis, also referred to as Pott's disease, presents a significant risk of severe paralysis if not promptly detected and treated, owing to complications such as spinal cord compression and deformity. This article presents the genetic analysis of a Mycobacterium tuberculosis STB-T1A strain, isolated from the spine of a 29-year-old female diagnosed with spinal tuberculosis. Genomic DNA was extracted from pure culture and subjected to sequencing using the Illumina NovaSeq 6000 sequencing system. The genome of the M. tuberculosis STB-T1A strain spans 4,367,616 base pairs with a G+C content of 65.56 % and 4174 protein-coding genes. Comparative genomic analysis, conducted via single nucleotide polymorphism (SNP)-based phylogenetic analysis using the Maximum Likelihood method, revealed that the strain falls within the Indo-Oceanic lineage (Lineage 1). It clusters with the M. tuberculosis 43-16836 strain, which was isolated from the cerebrospinal fluid of a patient with tuberculous meningitis in Thailand. The complete genome sequence has been deposited at the National Center for Biotechnology Information (NCBI) GenBank database with the accession number JBBMVZ000000000.

Free full text

Data Brief. 2024 Jun; 54: 110476.

Published online 2024 Apr 25. https://doi.org/10.1016/j.dib.2024.110476

PMCID: PMC11079456

PMID: 38725551

First whole genome sequencing data of a Mycobacterium tuberculosis STB-T1A strain isolated from a spinal tuberculosis patient in Sabah, Malaysia

Kai Ling Chin,^a,^b, Eraniyah Jastan Suing,^a Ruhini Andong,^a Choong Hoon Foo,^c Sook Kwan Chan,^c Jaeyres Jani,^b Kamruddin Ahmed,^b,^d and Zainal Arifin Mustapha^e

Author information Article notes Copyright and License information Disclaimer

Associated Data

Data Availability Statement

Mycobacterium tuberculosis strain STB-T1A, whole genome shotgun sequencing project (Original data) (NCBI).

Abstract

Keywords: Pott's disease, Spine, Mycobacterium tuberculosis, Next generation sequencing, Sabah, Malaysia

Specifications Table

Subject	Health and Medical Sciences
Specific subject area	Infectious diseases
Type of data	Raw data, whole genome sequencing, gene annotation, variant calling, and comparative genomic analysis of a Mycobacterium tuberculosis strain
Data collection	Bone tissue was taken from the spine of a patient suspected to have spinal tuberculosis, and it was tested using Xpert^Ⓡ MTB/RIF Ultra for tuberculosis diagnosis. Subsequently, the sample was cultured, bacterial genomic DNA was extracted, and whole-genome sequencing was performed. The raw sequencing data was then utilized for de novo assembly and phylogenetic analysis.
Data source location	Queen Elizabeth Hospital, Kota Kinabalu, Sabah
Data accessibility	Repository name: National Center for Biotechnology Information (NCBI) Data identification number: BioProject: PRJNA1091826, BioSample: SAMN40613452, Sequence Read Archive (SRA): SRR28465663, GenBank: JBBMVZ000000000 Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1091826 https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN40613452 https://www.ncbi.nlm.nih.gov/sra/?term=SRR28465663 https://www.ncbi.nlm.nih.gov/nuccore/JBBMVZ000000000

Open in a separate window

1. Value of the Data

• This is the first report of a M. tuberculosis strain isolated from a spinal tuberculosis patient in Sabah, Malaysia, and its whole-genome sequence could provide fundamental insights into its microbial activities, facilitating a deeper understanding of its characteristics.
• The data are crucial for comprehending the genetic characteristics of the M. tuberculosis strain by providing detailed information about gene content, function, and genomic organization.
• The data are important for gaining crucial insights into the genetic diversity and evolutionary dynamics of M. tuberculosis strains from Sabah and other regions, facilitating the understanding of transmission patterns across geographical areas.

2. Background

A 29-year-old female presented with clinical symptoms indicative of spinal tuberculosis, including gibbous deformity, cold abscess, paradiscal lesion, anterior vertebral loss, narrowed disc space, and paravertebral shadows. She also exhibited tuberculosis (TB)-related symptoms such as loss of appetite, weight loss, and malnutrition, with a body mass index (BMI) below 18.5, a high-risk factor for TB infection. A bone tissue sample was obtained from the spine and the patient was diagnosed with tuberculosis using Xpert^Ⓡ MTB/RIF Ultra. The bacterial isolate was obtained using the BD BACTEC™ MGIT™ culture system. Bacterial DNA was extracted and whole genome sequencing (WGS) was conducted.

3. Data Description

This article presents the data analysis of the WGS of M. tuberculosis STB-T1A strain from Sabah, Malaysia. A total of 18,097,866 paired reads at 150 bp read length were generated from the Illumina NovaSeq 6000 sequencing system, with a sequencing coverage of 615X. De novo assembly of the genome generated 146 contigs with N50 of 161,185 bp and the largest contig observed was 303,749 bp. The whole genome size was 4,367,616 bases with G+C content of 65.56 %. The genetic makeup comprises 4174 coding sequences (CDS), 45 tRNAs, one 5S, one 16S, and one 23S rRNAs, and three ncRNAs. Statistical report of variant calling showed that 99.58 % of the reads were mapped to the M. tuberculosis H37Rv reference genome. Within this dataset, 2193 single nucleotide polymorphisms (SNPs), 192 insertions, and 166 deletions were identified. Comparative genomic analysis with M. tuberculosis strains from different lineages revealed that the M. tuberculosis STB-T1A strain belongs to the Indo-Oceanic lineage (Lineage 1) and has similar characteristics with the M. tuberculosis 43-16836 isolated from a tuberculous meningitis patient in Thailand [1] (Fig. 1). The M. tuberculosis STB-T1A strain is predicted to be drug susceptible based on analysis using the Mykrobe software.

Fig. 1

Comparative phylogenetic analysis of M. tuberculosis STB-T1A strain (red arrow). This strain belongs to Lineage 1 and clusters with the M. tuberculosis 43-16836 strain from Thailand. The phylogenetic tree was constructed using SNP data, utilizing the Maximum Likelihood method and the General Time Reversible model. The tree was rooted with M. canettii serving as the outgroup.

4. Experimental Design, Materials and Methods

4.1. Sample Collection and Tuberculosis Detection

A 29-year-old female presented with clinical symptoms indicative of spinal tuberculosis at Queen Elizabeth Hospital in Kota Kinabalu, Sabah. Bone tissue sample was collected via biopsy method from the spine by an orthopedic surgeon. The sample was subjected to tuberculosis (TB) detection with Xpert^Ⓡ MTB/RIF Ultra (Cepheid, Sunnyvale, CA, USA) following the manufacturer's protocol. The processed sample was transferred to a cartridge and inserted into a GeneXpert machine for automated DNA extraction and real-time polymerase chain reaction (qPCR) for qualitative detection of Mycobacterium tuberculosis Complex (MTBC) and rifampicin (RIF) resistance [2]. Based on the cycle threshold (Ct) value, the semi-quantitative bacterial load was reported.

4.2. Bacterial Culture and DNA Extraction

The bone tissue was decontaminated with BBL™ MycoPrep™ (Becton, Dickinson, NJ, USA). The processed sample was cultured in a Mycobacterium Growth Indicator Tube (MGIT) tube containing 7H9 Middlebrook broth with PANTA (polymyxin-B, Amphotericin-B, nalidixic acid, trimethoprim, azilocillin) antibiotic and OADC (oleic acid, albumin, dextrose, catalase) supplement mixture. The tube was loaded into the BD BACTEC™ MGIT™ 320 system (Becton, Dickinson, NJ, USA), and incubated at 37 °C until bacterial growth was detected by the system [2]. DNA was extracted using the Masterpure™ Complete DNA and RNA Purification kit (Epicentre Biotechnologies, Madison, WI, USA) according to the manufacturer's instruction, with an extended lysis protocol for 16 h with Proteinase K. The quality of the extracted DNA was determined by Nanodrop 2000c spectrophotometer (ThermoFisher Scientific, USA) and gel electrophoresis [3].

4.3. Whole Genome Sequencing and Data cleaning

The genomic DNA was sent to Apical Scientific Sdn. Bhd., Malaysia for library preparation, followed by whole genome sequencing by Illumina NovaSeq 6000 platform. The sequencing data has been submitted to the National Center for Biotechnology Information (NCBI) and can be accessed under the following accession numbers: BioProject PRJNA1091826, BioSample SAMN40613452, Sequence Read Archive (SRA) SRR28465663, and GenBank JBBMVZ000000000.

The output of the sequencing was in FastQ format file. FastQC version 0.12.1 was used for assessing the quality of raw sequencing reads [4], and fastp version 0.23.4 was used for trimming adapter sequences and filtering out reads with less than 50 bp [5].

4.4. De novo Assembly, Variant Calling, and Phylogenetic Analysis

The de novo assembly process began with KmerGenie version 1.7051 to determine the optimal k-mer for assembly, utilizing the processed reads [6]. Subsequently, a draft genome was generated using SPAdes version 3.15.4 to assemble the processed reads into contigs [7]. Following assembly, the quality of the resulting contigs was assessed with QUAST version 5.0.2 [8]. Finally, functional annotation of the assembled contigs was performed using NCBI Prokaryotic Genome Annotation Pipeline (PGAP) to identify genes and annotate their functions [9].

The variant calling process began by aligning the processed reads with the M. tuberculosis H37Rv reference genome (GenBank accession number: NC_000962.3) using Burrows-Wheeler Aligner (BWA) version 0.7.17 [10]. The mapped reads in Sequence Alignment/Mat (SAM) format were converted to Binary Alignment Map (BAM) format and sorted using Samtools version 1.19.2 [11]. Following alignment, variant calling was performed using Genome Analysis Toolkit (GATK), which employs HaplotypeCaller to identify differences (variants) between the sample genome and the reference genome, including SNPs, insertions, deletions, and other genomic variations [12]. After initial variant calling, BCFtools version 1.19 was used for further refining the variants [11]. The functional effects of the variants were annotated using SnpEff version 5.0 to gain insights into the potential functional consequences of these genetic alterations on genes [13].

kSNP3 was used to detect SNPs and obtain a SNP matrix representing genetic variations among strains [14], including the draft genome of M. tuberculosis STB-T1A strain generated by SPAdes, and whole genome sequences from other Lineages obtained from NCBI GenBank, i.e., L1: M. tuberculosis T92 (NZ_JLDA00000000.1), M. tuberculosis MTBR1/09 (LATN00000000.1), M. tuberculosis T46 (ACHO00000000.1), M. tuberculosis EAI/OSDD271 (AQQC00000000.1), M. tuberculosis T17 (JLCV00000000.1), M. tuberculosis 43-16836 (ATNF00000000.1), M. tuberculosis PR05 (AOMG00000000.2), M. tuberculosis M1976 (KK331618.1), and M. tuberculosis EAS054 (ABOV00000000.1); L2: M. tuberculosis W-148 (NZ_CP012090.1); L3: M. tuberculosis CAS/NITR204 (CP005386.1); L4: M. tuberculosis H37Rv (NC_000962.3); L5: M. africanum CPHL_A (ACHP00000000.1); L6: M. africanum K85 (ACHQ00000000.1); L7: M. tuberculosis N3913 (NZ_CP069063.1); and M. canettii CIPT 140010059 (NC_015848.1). The resulting SNP matrix was used for downstream phylogenetic analysis with Molecular Evolutionary Genetics Analysis version 11 (MEGA 11) [15]. After alignment of the nucleotide sequences using ClustalW, the most appropriate evolutionary model for the dataset was predicted and Maximum Likelihood analysis with bootstrapping (1000 replicates) was performed to infer the phylogenetic relationships among the strains. The drug susceptibility of the strain was predicted using Mykrobe Predictor TB version 0.1.0, utilizing raw sequencing reads as the input data [16].

Limitations

None.

Ethics Statement

The ethics approval for this study was obtained from the National Medical Research Register (NMRR) and the Medical Research Ethics Committee (MREC) (NMRR ID-22-02464-T2O). Informed consent for sample collection was obtained from the participant. The authors kept the ethical concerns into consideration when gathering data and ensured that the information obtained from the respondent was only utilized for research purposes.

CRediT authorship contribution statement

Kai Ling Chin: Conceptualization, Writing – original draft, Writing – review & editing, Visualization, Investigation, Data curation, Supervision. Eraniyah Jastan Suing: Visualization, Investigation, Data curation. Ruhini Andong: Visualization, Investigation, Data curation. Choong Hoon Foo: Conceptualization, Supervision, Data curation. Sook Kwan Chan: Data curation. Jaeyres Jani: Visualization, Investigation, Data curation. Kamruddin Ahmed: Conceptualization, Supervision. Zainal Arifin Mustapha: Conceptualization, Supervision, Funding acquisition.

Acknowledgments

The authors acknowledge the Ministry of Higher Education (MOHE) for funding under the Fundamental Research Grant Scheme (FRGS) (FRGS/1/2022/SKK12/UMS/01/2).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

Mycobacterium tuberculosis strain STB-T1A, whole genome shotgun sequencing project (Original data) (NCBI).

References

1. Viratyosin W., et al. Draft genome sequence of the Mycobacterium tuberculosis strain 43-16836, belonging to the indo-oceanic lineage, isolated from tuberculous meningitis in Thailand. Genome Announc. 2013;1(5) 10.1128/genomea.00801-13. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

2. Li Z., et al. Evaluation of different diagnostic methods for spinal tuberculosis infection. BMC Infect. Dis. 2023;23(1):695. [Europe PMC free article] [Abstract] [Google Scholar]

3. Jani J., et al. The whole genome sequence data analyses of a Mycobacterium tuberculosis strain SBH321 isolated in Sabah, Malaysia, belongs to Ural family of Lineage 4. Data Br. 2020;33 [Europe PMC free article] [Abstract] [Google Scholar]

4. S. Andrews, FastQC: a quality control tool for high throughput sequence data. Available online at: https://qubeshub.org/resources/fastqc (2010).

5. Chen S., et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. [Europe PMC free article] [Abstract] [Google Scholar]

6. Chikhi R., Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics. 2013;30(1):31–37. [Abstract] [Google Scholar]

7. Prjibelski A., et al. Using SPAdes De novo assembler. Curr. Protoc. Bioinform. 2020;70(1):e102. [Abstract] [Google Scholar]

8. Mikheenko A., et al. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34(13):i142–i150. [Europe PMC free article] [Abstract] [Google Scholar]

9. Tatusova T., et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016;44(14):6614–6624. [Europe PMC free article] [Abstract] [Google Scholar]

10. Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. [Europe PMC free article] [Abstract] [Google Scholar]

11. Danecek P., et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2):1–4. [Europe PMC free article] [Abstract] [Google Scholar]

12. DePristo M.A., et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43(5):491–498. [Europe PMC free article] [Abstract] [Google Scholar]

13. Cingolani P., et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. (Austin) [Europe PMC free article] [Abstract] [Google Scholar]

14. Gardner S.N., Slezak T., Hall B.G. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics. 2015;31(17):2877–2878. [Abstract] [Google Scholar]

15. Tamura K., Stecher G., Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 2021;38(7):3022–3027. [Europe PMC free article] [Abstract] [Google Scholar]

16. Hunt M., et al. Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe. Wellcome Open. Res. 2019;4:191. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

Full text links

Read article at publisher's site: https://doi.org/10.1016/j.dib.2024.110476

Data

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioProject

(3 citations) BioProject - PRJNA1091826

Nucleotide Sequences (5)

(3 citations) ENA - SRR28465663
(1 citation) ENA - CP012090
(1 citation) ENA - KK331618
(1 citation) ENA - CP069063
(1 citation) ENA - CP005386

Search life-sciences literature (45,103,589 articles, preprints and more)

First whole genome sequencing data of a Mycobacterium tuberculosis STB-T1A strain isolated from a spinal tuberculosis patient in Sabah, Malaysia.

Author information

Affiliations

Authors

Authors

Authors

Authors

Abstract

Free full text

First whole genome sequencing data of a Mycobacterium tuberculosis STB-T1A strain isolated from a spinal tuberculosis patient in Sabah, Malaysia

Kai Ling Chin

Eraniyah Jastan Suing

Ruhini Andong

Choong Hoon Foo

Sook Kwan Chan

Jaeyres Jani

Kamruddin Ahmed

Zainal Arifin Mustapha

Associated Data

Abstract

1. Value of the Data

2. Background

3. Data Description

4. Experimental Design, Materials and Methods

4.1. Sample Collection and Tuberculosis Detection

4.2. Bacterial Culture and DNA Extraction

4.3. Whole Genome Sequencing and Data cleaning

4.4. De novo Assembly, Variant Calling, and Phylogenetic Analysis

Limitations

Ethics Statement

CRediT authorship contribution statement

Acknowledgments

Declaration of Competing Interest

Data Availability

References

Full text links

Data

Data behind the article

BioProject

Nucleotide Sequences (5)

RefSeq - NCBI Reference Sequence Database (3)

Similar Articles

The whole genome sequence data analyses of a Mycobacterium tuberculosis strain SBH321 isolated in Sabah, Malaysia, belongs to Ural family of Lineage 4.

Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia.

Whole genome sequencing data and analysis of a rifampicin-resistant Mycobacterium tuberculosis strain SBH162 from Sabah, Malaysia.

Spinal tuberculosis (Pott's disease): its clinical presentation, surgical management, and outcome. A survey study on 694 patients.

Funding

Ministry of Higher Education (1)﻿

Ministry of Higher Education, Malaysia

Partnerships & funding

Ministry of Higher Education (1)