Abstract
Free full text
Tumor detection by analysis of both symmetric- and hemi-methylation of plasma cell-free DNA
Abstract
Aberrant DNA methylation patterns have been used for cancer detection. However, DNA hemi-methylation, present at about 10% CpG dinucleotides, has been less well studied. Here we show that a majority of differentially hemi-methylated regions (DHMRs) in liver tumor DNA or plasma cells free (cf) DNA do not overlap with differentially methylated regions (DMRs) of the same samples, indicating that DHMRs could serve as independent biomarkers. Furthermore, we analyzed the cfDNA methylomes of 215 samples from individuals with liver or brain cancer and individuals without cancer (controls), and trained machine learning models using DMRs, DHMRs or both. The models incorporated with both DMRs and DHMRs show a superior performance compared to models trained with DMRs or DHMRs, with AUROC being 0.978, 0.990, and 0.983 in distinguishing control, liver and brain cancer, respectively, in a validation cohort. This study supports the potential of utilizing both DMRs and DHMRs for multi-cancer detection.
Introduction
Cancer is a major public health threat worldwide. While the cancer death rate has fallen continuously since the peak in 1991 in the United States, it is estimated that over 600,000 people died from cancer in 2021 in the United State alone1. World-wide, almost 10 million people died from cancer in 2020, and the death rate increased in some low and middle income countries in recent years2. Therefore, there remains an urgent and unmet need to combat cancer. It has been shown that early tumor detection has potential to improve prognosis for cancer patients3. For instance, the five-year survival rate for hepatocellular carcinoma (HCC) when diagnosed at early and localized stage is 34%, but drops to 3% when diagnosed at late stage with distant disease1. Early cancer detection also contributed to the reduced cancer death rate in the United States in the last couple of decades. Therefore, it is critically important to develop assays for early cancer detection.
It has been proposed that liquid biopsy offers several advantages for cancer early detection4–8. For instance, liquid biopsy samples can be obtained non-invasively and in principle can overcome the challenges arising from tumor heterogeneity that confounds tissue biopsy procedures. Indeed, several methods have been developed to use plasma cell free (cf) DNA for tumor detection. Plasma cfDNA are a mixture of extracellular DNA fragments released from apoptotic and/or necrotic cells or released via active secretion6,7. While the majority of the plasma cfDNA comes from normal cells such as lymphocytes, cancer cells also release DNA fragments into circulation9. Analysis of cancer related mutations in plasma cfDNA has been reported for early cancer detection. Due to limited mutations in cancer cells and the evolving nature of these mutations, a significant amount of test material (7.5–10ml plasma) as well as sequence depth are required for specific mutation detection10,11. In addition to genetic mutations, other cfDNA features including fragmentomics12,13, and epigenetic features including nucleosome patterns14–16 and DNA methylation17–22 have also been analyzed for detection of a variety of cancer types. Among all these features analyzed so far, it was reported that models trained with DNA methylation performed better in tumor detection than models trained using other features including single nucleotide variants or cfDNA pan features including fragment length23, supporting the idea that analysis of DNA methylomes of plasma cell free DNA likely represents an outstanding approach for tumor detection.
The vast majority of DNA methylation in mammalian cells occurs at CpG dinucleotides in a symmetric manner: the cytosines (C) in a CpG dinucleotide on both Watson strand and its complementary Crick strand are methylated24,25. During DNA replication, hemi-methylated CpG dinucleotides, consisting of methylated CpGs on the parental strand and non-methylated CpGs on the complementary nascent strand, are rapidly converted into symmetric and fully methylated CpGs to maintain DNA methylation patterns25. Early studies indicate that failure of symmetrical methylation and intermediates of active demethylation in cancer genesis contributes to the generation of DNA hemi-methylation (HM)26. Interestingly, it has been observed that about 10% of CpG dinucleotides in human embryonic stem cells are hemi-methylated, and these hemi-methylated regions (HMRs) could be maintained during multiple cell divisions27,28. Moreover, it has been shown recently that while HM at the motif strand of CTCF, a critical regulator of genome organization, inhibits the binding of CTCF, HM on the opposite strand stimulates CTCF binding29. Therefore, HM is an epigenetic mark, and likely plays an important role to regulate genome organization and gene transcription. However, while HMRs have been analyzed in various cell lines based on bisulfite-sequencing (BS-Seq) or MeDIP-Seq30–35, few studies, if any, have explored these hemi-methylated regions alone or in combination with symmetrically methylated CpGs for tumor detection and for tumorigenesis.
Currently, various methods have been used to analyze cfDNA methylomes for cancer detection. For instance, the CCGA (Circulating Cell-Free Genome Atlas) employed targeted bisulfite sequencing to analyze methylated regions of cfDNA in more than 50 cancer types21,22. Because bisulfite treatment of DNA results in a marked loss of DNA, this method, on average, requires up to 8–10ml plasma and over 100 million sequence reads per sample for tumor detection. Shen et al.20 and Nassiri et al.19 developed a cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMeDIP–seq) method based on double strand DNA ligation. This method requires cfDNA purified from 0.5–3.5ml plasma samples. Because a fraction of cfDNA molecules are single-stranded (ss) DNA fragments and damaged double-stranded DNA fragments, these DNA molecules are in principle not used for DNA methylome analysis based on the traditional double stranded DNA based library preparation method. It has been shown that single-stranded DNA (ssDNA) library preparation method can include all cfDNA molecules (ssDNA, dsDNA and damaged DNA) for sequencing analysis, which leads to increased sensitivity compared to methods using traditional library preparation methods36,37. Therefore, in principle, utilization of ssDNA library preparation method for cfDNA methylome analysis will likely increase the sensitivity compared to methods relying on double stranded DNA ligation. Importantly, none of these studies utilized DNA hemi-methylation for tumor detection, likely due to the fact that it was not known whether differentially hemi-methylated regions (DHMRs) are independent biomarkers from differentially methylated regions (DMRs).
We developed two methylated DNA immunoprecipitation and strand-specific (ss) sequencing methods (MeDIP-Seq) for genomic DNA (ssg-MeDIP-Seq) and plasma cell free (cf) DNA (sscf-MeDIP-Seq) for analysis of methylomes, respectively. The sscf-MeDIP-Seq method can analyze methylomes of cfDNA molecules including ssDNA, dsDNA and damaged DNA. Therefore, we produced reliable sscf-MeDIP-Seq datasets with cfDNAs isolated from 300–500μl of plasma samples. Importantly, both methods can analyze both symmetrically methylated and hemi-methylated regions. Through in-depth analysis of DMRs and DHMRs of liver tumor DNA and cfDNA samples, we found that the vast majority of tumor DNA as well as cfDNA DHMRs do not overlap DMRs for the same samples, suggesting that DHMRs can serve as biomarkers independent of DMRs. Indeed, we found that machine learning models using both DMRs and DHMRs as input features outperform models trained with DMRs or DHMRs alone for tumor detection based on analysis of 271 plasms cfDNA samples. Together, our studies reveal that the utilization of both DMRs and DHMRs identified by the sscf-MeDIP-seq procedures as biomarkers will likely improve the accuracy for multi-cancer detection.
Results
Develop genomic methylated DNA immunoprecipitation with a strand-specific sequencing method (ssg-MeDIP-Seq)
MeDIP-seq has been used to analyze DNA methylation (5-mC), and almost all published MeDIP-Seq procedures rely on sonication of genomic DNA into small fragments followed by immunoprecipitation with antibodies against methylated DNA35. As Tn5 transposase has been used for genomic DNA fragmentation for the generation of libraries for next generation sequencing, we tested whether Tn5 can be used for fragmentation of genomic DNA before immunoprecipitation (Fig. 1a). Briefly, 100ng of genomic DNA isolated from tissues were incubated with pA-Tn5 transposase, which fragments and inserts an adaptor into dsDNA in a sequence independent manner. As pA-Tn5 transposase covalently ligates the adaptor to the 5’ end of target DNA, we then ligated a different adaptor at the 3’ end through the oligo-replacement step. In this way, we could analyze DNA methylation patterns in a strand-specific manner, which in turn allows us to detect both symmetric DNA methylation (SM) as well as hemi-methylation (HM). We termed this method as ssg-MeDIP-Seq. Following the adaptor ligation, DNA fragments were denatured into single-stranded DNA (ssDNA) and methylated DNAs were immunoprecipitated using antibodies against 5-mC. The enriched methylated ssDNAs were amplified by PCR for library preparation and subsequent sequencing (Fig. 1a). Using this method, we first analyzed DNA methylation of 16 tissue samples, eight isolated from liver tumors and eight from their corresponding adjacent non-tumor (Adj-NT) tissues. The ssg-MeDIP-seq signals of the 8 tumor samples were depleted at the promoters of genes with CpG island (CGI) compared to those without CGI (Supplementary Fig. 1a), a pattern consistent with DNA methylation detected using other methods. A similar pattern was also detected for plasma cfDNA methylation analyzed by sscf-MeDIP-seq described below (Supplemental Fig. 1b–d). Next, by comparing methylomes of eight liver tumors to their corresponding adjacent non-tumor tissues at 2,002,724 DNA methylation blocks, which cover 70% of CpG dinucleotides in the genome, with each block consisting of at least four CpGs38, we identified 11,930 hypermethylated DMRs and 12,974 hypomethylated DMRs (Fig. 1b). For instance, a DMR specifically in tumors compared to Adj-NT samples was identified at the gene locus of TBX2, a gene known to be methylated in liver cancer39 (Fig. 1c). To determine whether these DMRs identified in liver tumors showed concordance with DMRs of liver tumors from an independent source, we analyzed the DNA methylation profiles of 50 liver cancer sample from TCGA, which were generated using 450K CpGs methylation microarray but were not suitable for analysis of HM regions (HMR) (see below). Despite the dramatic technical differences between ssg-MeDIP-Seq and 450K methylation arrays, we found that hypomethylated and hypermethylated DMRs identified in liver tumors using ssg-MeDIP-seq overlapped significantly with hypomethylated and hypermethylated DMRs identified using the TCGA liver cancer datasets, respectively (group “A” and “D” in Fig. 1d). In contrast, concordance between hypermethylated DMRs identified using ssg-MeDIP-Seq and hypomethylated DMRs in the TCGA datasets and vice versa was not so significant (group “B” and “C” in Fig. 1d). Similar results were obtained to analyze the overlaps between liver tumor DMRs from this study and by TCGA using the Fisher test (Fig. 1e). Finally, we found that hyper-methylated DMRs were enriched at exons, promoters and CGIs, whereas hypo-methylated DMRs were enriched at intergenic regions, satellites, and SINEs (Fig. 1f). Taken together, these results indicate that the ssg-MeDIP-seq procedure can be used for analyzing genomic DNA methylomes.
Liver tumor DNA DHMRs and DMRs are likely independent biomarkers
Recently, it has been shown about 10% of CpG dinucleotides are hemi-methylated (Fig. 2a), and are heritable27,28. However, to our knowledge, no studies have been performed to compare DHMRs to DMRs for the same samples systematically. Because ssg-MeDIP-Seq method could detect DNA methylation at Watson and Crick strands separately, we therefore analyzed the hemi-methylated regions (HMRs) at 2,002,724 blocks38 in 8 liver tumor samples and their matched Adj-NT using the formula shown in Fig. 2a. To minimize the contribution of the experimental procedures and sequence depth to the identification of false positive HMRs, we first prepared libraries of two input samples, one liver tumor and one Adj-NT, by following the same procedure of ssg-MeDIP-Seq except that these two DNA samples were not subjected to methylated DNA immunoprecipitation. In principle, these input samples should not exhibit HM at the ~2M methylation blocks. Indeed, majority of ~2M blocks did not show strand bias signals (Supplemental Fig. 2a, b). In contrast, a marked number of blocks showed strand bias signals/HM signals based on the cutoff of Watson-Crick)/(Watson+Crick)>0.3 in 16 ssg-MeDIP-seq samples (Supplemental Fig. 2c, d). Because HM signals at each block were calculated using the formula, (Watson-Crick)/(Watson+Crick), sequence depth may affect HMR identification. Therefore, we tested different RPM at each block as additional cutoffs. We found that the number of blocks showing strand bias for the two input samples was reduced dramatically using sequence read RPM>1 at each block as the cutoff compared to that RPM>0.5 (Supplemental Fig. 1b). A further increase of RPM to 1.5 or 2 as the cutoff did not reduce the number of blocks showing bias markedly (Supplemental Fig. 1b). Similar results were found when we analyzed 8 input samples of plasma cell free DNA (Supplemental Fig. 1e, f, see below). Therefore, we used the cutoff ((Watson-Crick)/(Watson+Crick)>0.3, RPM>1, and p<0.01) to identify HMRs of 8 liver tumor DNA and their corresponding Adj-NT, and identified 192,106 and 228,575 HMRs in 8 liver tumor and their Adj-NT groups, respectively. The number of HMRs identified in both group of samples was roughly ~10% of ~2M methylation blocks used for analysis. Furthermore, the HMRs of both liver tumor and Adj-NT were enriched the most at genomic regions of SINEs, CpG islands, promoters and exons, and with a slight enrichment at satellites and introns (Fig. 2c). Finally, we identified 6864 DHMRs in liver tumor DNA samples compared to their corresponding Adj-NT. These DHMRs included 2330 regions with increased HM and 4534 regions with reduced HM at either Watson or Crick strands compared to the controls (Fig. 2d). Remarkably, the majority of liver tumor DHMRs (4474 out of 6562) did not overlap with DMRs (Fig. 2e). The DHMRs with increased HM in liver tumor samples were enriched at genomic regions of SINEs, CpG islands, promoters and exons, whereas DHMRs with reduced HM were enriched at SINEs and CpG islands, but not promoters (Fig. 2f). Interestingly, the closest genes within 20kb to these liver tumor HMRs (Fig. 2g) and DHMRs with increased HM (Fig. 2h) were enriched in processes linking to cellular metabolism. These results suggest that DHMRs likely represent independent biomarkers, consistent with the idea that DNA hemi-methylation is an epigenetic marker.
To understand why the majority of DHMRs did not overlap with DMRs, we analyzed the methylation density at either Watson or Crick strands at 24,904 DMRs and 6864 DHMRs of the 8 liver tumor samples compared to their Adj-NTs. We found that the methylation density at only one strand (either Watson or Crick strands) of 6864 DHMRs was increased or reduced markedly in tumor samples compared to Adj-NT controls (Supplemental Fig. 3a–d). In contrast, the methylation density of both Watson and Crick strands at DMRs were increased or reduced to a similar degree in liver cancer samples compared to the same Adj-NT controls (Supplemental Fig. 3e–h). These results suggest that DHMRs arise from changes in DNA methylation at one strand, whereas DMRs from changes in DNA methylation of both strands. Taken together, these results indicate that liver tumor DHMRs and DMRs are most likely independent biomarkers.
Develop the sscf-MeDIP-Seq method for analyzing cfDNA methylation and hemi-methylation
There is a tremendous interest in analyzing plasma cfDNA methylomes for tumor detection20. Compared to the large size of genomic dsDNA, plasma cfDNAs are a mixture of dsDNA and ssDNA with major fragment sizes about 160–170 nucleotides. Furthermore, some of these DNA are nicked or damaged36. We took advantage of our extensive experience in preparing single-stranded DNA libraries for next-generation sequencing40,41, which originated from methods for sequencing ancient DNA samples also consisting of dsDNA, ssDNA and damaged DNA42, to develop procedures to analyze cfDNA methylomes. Briefly, after denaturing cfDNA into ssDNA, we ligated an adaptor to the 3’ end of cfDNA using an ssDNA ligase followed by converting ssDNA into dsDNA by a DNA polymerase. After the ligation of the second adaptor, a small fraction of DNA (10%) was saved as the input sample, and the remaining DNA was denatured again and subjected to immunoprecipitation using antibodies against 5-mC. The immunoprecipitated DNA as well as the input DNA were then amplified by PCR for library preparation and sequencing (Fig. 3a). In this way, all DNA molecules including dsDNA, ssDNA, and damaged DNA will be utilized for methylome analysis (Fig. 3a). Importantly, this method can analyze both SM and HM. We termed the method as single-stranded (ss)cf-MeDIP-Seq.
Using this method, we first performed in-depth analysis of 20 sscf-MeDIP-Seq datasets generated from cfDNAs of 10 individuals with liver tumors and 10 controls with similar age and gender distributions to gain insight into the performance of sscf-MeDIP-Seq and the properties of cfDNA DMR and DHMRs. Similar to ssg-MeDIP-seq, sscf-MeDIP-seq signals were depleted at the promoter regions of genes with CGI compared to those without CGI for all the three sample groups (Supplemental Fig. 1b–d). Using the same 2,002,724 methylation blocks38, we identified 2229 hyper-methylated and 5002 hypo-methylated cfDNA DMRs for 10 liver cancer cfDNA samples compared to the 10 controls (Fig. 3b, c), with hyper-methylated cfDNA DMRs enriched at CGIs, promoters and exons and hypo-methylated ones at satellite DNA, intergenic regions, CGI and SINEs (Supplemental Fig. 4a). We then asked whether these liver cancer cfDNA DMRs overlapped with liver tumor DNA DMRs identified in Fig. 1. We found that both hyper-methylated and hypo-methylated cfDNA DMRs exhibited significant overlap with liver tumor DNA hyper-methylated and hypo-methylated DMRs analyzed by ssg-MeDIP-Seq, respectively (“A” and “D” group in Fig. 3d). In contrast, the overlaps between hypo-methylated cfDNA DMRs and hyper-methylated liver tumor DNA DMRs or vice versa (“B” and “C” groups Fig. 3d) was much less significant. Similar analysis of overlaps between cfDNA DMRs and liver tumor DNA DMRs using the Fisher test showed the same conclusion (Fig. 3e). Together, these results show that plasma cfDNA DMRs of patients with liver tumor identified by sscf-MeDIP-Seq method most likely reflect DNA methylation changes in liver cancer cells.
The majority of plasma cfDNA DHMRs also do not overlap with cfDNA DMRs
To identify cfDNA DHMRs, we first analyzed 8 input samples that were prepared by following the same procedure as sscf-MeDIP-seq except that these 8 DNA samples were not subjected to methylated DNA immunoprecipitation. We found that like the two genomic DNA input samples (Supplemental Fig. 2a, b), the number of blocks that exhibited strand bias was markedly reduced using RPM>1 at each block as the cutoff compared to RPM>0.5 (Supplemental Fig. 2e, f). We therefore used the same cutoff for the analysis of the ssg-MeDIP-Seq datasets and analyzed cfDNA DHMRs of these 10 liver tumor samples compared to the 10 controls and identified 1179 and 988 DHMRs with increased and reduced HM at either Watson or Crick strand, respectively, compared to the 10 control samples (Fig. 3f). These cfDNA HMRs from both liver cancer and control samples were enriched at SINEs, satellites, promoters and exons (Supplemental Fig. 4b). In contrast, cfDNA DHMRs specific for liver tumor samples with increased HM were enriched at CpG islands, promoters and exons, whereas those with reduced HM were enriched at SINEs, exons and intergenic regions (Supplemental Fig. 4c). Finally, we asked whether liver tumor cfDNA DHMRs also showed a significant overlap with liver tumor DNA DHMRs when compared with the same control cfDNA samples. We observed that cfDNA DHMRs with increased and reduced HM showed significant overlap with tumor DNA DHMRs with increased and reduced HM, respectively (Supplementary Fig. 4d, e). These results indicate that cfDNA DHMRs likely also reflect tumor DNA DHMRs. Importantly, like liver tumor genomic DNA DMRs and DHMRs, the vast majority of plasma cfDNA DHMRs from liver cancer samples did not overlap with cfDNA DMRs for the same samples (Fig. 3g), indicating that cfDNA DHMRs could also be used as independent biomarkers for tumor detection.
Identification of cancer types using machine learning models trained using DMRs, DHMRs and DMRs+DHMRs as inputs
It has been shown that cfDNA methylation could be used to identify tumor origins18. To determine whether sscf-MeDIP-Seq procedures could be used for tumor prediction, we analyzed cfDNA methylomes of three groups of plasma samples: patients with liver (73 samples) or brain (97 samples) cancer and controls (101 samples) (Table 1) and generated a total 271 sscf-MeDIP-Seq datasets. Of the 271 sscf-MeDIP-Seq datasets generated, 215 datasets including 58 liver cancer and 77 brain cancer samples, and 80 controls were randomly selected and used as the training cohort to train machine learning models of GLMnet, random forest or deep neural network (DNN) (Fig. 4a). All three machine learning models accurately predicted samples in the validation cohorts (56 samples consisting of 20 brain cancer, 15 liver cancer and 21 control samples), with GLMnet models showing the best performance (Fig. 4b–e and Supplemental Fig. 5a–f), highlighting the robustness of our prediction and sscf-MeDIP-Seq datasets. As general procedures for model training and sample validation are similar for all three models, we focused our discussion on GLMnet models below.
Table 1
Training cohort (N=215) | Validation cohort (N=56) | |
---|---|---|
Sex and age | ||
Male | 127 | 34 |
Female | 87 | 22 |
Unknown | 1 | 0 |
Age at diagnosis/recruitment | ||
Young (<30 years) | 13 | 6 |
Middle age (30~60 years) | 107 | 25 |
Old age (>60 years) | 95 | 25 |
Unknown | 0 | 0 |
Brain cancer | 77 | 20 |
IDH WT | 34 | 9 |
IDH mutant | 43 | 11 |
Liver cancer | 58 | 15 |
Controls | 80 | 21 |
To reduce the influence of diversity of individual samples on model training, we randomly sampled 90% of the samples in the training cohort 10 times in a balanced way (control, brain and liver cancer), identified cfDNA DMRs and DHMRs specific for each sample group in a one-versus-other way, and selected the top DMRs and DHMRs based on the feature importance determined by the GLMnet models. In the beginning, we trained these models using different DMRs and DHMRs of each sample group with DMRs selected by p value and log fold change (LFC) of DNA methylation density and DHMRs selected by feature importance defined by the GLMnet models. We observed an increase in model performance when more stringent parameters were used for DMR and DHMR selection (Supplemental Fig. 5g–i). In the end, we selected the top 200 DMRs and 200 DHMRs from the three sample groups for each of the 10 rounds of training using either DMRs or DHMRs as inputs (Fig. 4a). We then combined DMR and DHMR models to train a calibration model for the final prediction of each sample in the training cohort. Briefly, to predict sample identity in the 56-sample validation cohort, we first predicted each sample using 10 models trained with DMRs or DHMRs as the inputs, and then combined the prediction results as the inputs of the calibration model to obtain final prediction probability of each sample. In general, we observed that models based on DMRs alone were slightly better predictors than models based on DHMR alone (Fig. 4b–d). Furthermore, when combined, DMR+DHMR-based models yielded a slightly more accurate prediction than models based on either DMRs or DHMRs alone (Fig. 4b–d), with AUROC of models using both DMR and DHMR as inputs for brain cancer, liver cancer and controls being 0.983 (95% confidence interval, 0.96–1), 0.990 (95% confidence interval, 0.97–1), and 0.978 (95% confidence interval, 0.95–1), respectively. The average probabilities for identifying brain cancer, liver cancer and control samples using DMR+DHMR-based models were 0.72, 0.75 and 0.76, respectively (Fig. 4e). Furthermore, the models also predicted early stage and late stage of liver cancer samples in the validation cohort equally well (Supplemental Fig. 6). Finally, two other machine learning models (random forest and DNN) using both DMRs and DHMRs as inputs were also robustly better than models using DMRs or DHMRs alone (Supplementary Fig. 5a–f). Together, these studies indicate that the sscf-MeDIP-Seq method developed here provides a unique way to analyze both cfDNA DMRs and DHMRs, the latter of which have not been used for tumor detection.
Evaluate the sensitivity of the sscf-MeDIP-Seq method
The amount of cfDNA in plasma differs from sample to sample, with early-stage tumors in general releasing less circulating tumor DNA into blood than late-stage tumors43,44. Therefore, we normally used 1/3-/1/2 of cfDNA purified from 1–1.5mL plasma sample for sscf-MeDIP-Seq experiments. To test the amount of cfDNA needed for the generation of high quality sscf-MeDIP-Seq datasets for tumor prediction, we chose two cfDNA samples with high cfDNA concentration, one from individual with liver tumor and one with brain tumor, and then generated three sscf-MeDIP-Seq datasets using three different amounts of cfDNA. We also used the fraction of cfDNAs from each sample (3.5ng (1/48), 10ng (1/16), 24ng (1/7 of the sample) for the brain cancer; and 3ng (1/20), 7ng (1/8) and 15ng (1/4) for the liver cancer, Supplemental Fig. 7) instead of the exact cfDNA amount when we generated sscf-MeDIP-Seq libraries. We then applied the GLMnet models trained in Fig. 4 to predict these samples based on sscf-MeDIP-seq datasets generated from different amounts of input DNAs. The DMR+DHMR-based models could predict brain and liver cancer samples at all three concentrations (Supplemental Fig. 7a, d). In contrast, the DMR- and DHMR-based models could reliably predict brain or liver cancer based on sscf-MeDIP-seq datasets from two different amounts of cfDNAs (Supplemental Fig. 7b, c and e, f). These results are consistent with the idea that DMR+DHMR-based models will likely be more robust in predicting tumor types. Furthermore, these results indicate that in general a higher input cfDNA used for sscf-MeDIP-Seq yielded better quality of sscf-MeDIP-Seq datasets for prediction. Therefore, we generated all 271 sscf-MeDIP-Seq datasets using cfDNAs purified 300μl to 500μl of plasma samples, which are equivalent to 1/3-1/2 cfDNA purified from 1–1.5mL plasma of the majority of samples analyzed in this study.
Differentiate glioma subtypes by cfDNA methylomes
We also tested whether cfDNA methylome analysis can be used to differentiate the subtypes of brain tumors. Of 77 cfDNA samples from brain tumor patients in the training cohort, 43 samples were from patients with IDH mutations and 34 with IDH wild type. To train brain tumor subtype models, we first separated the 77 brain tumors samples of the training cohort into IDH mutant (43 samples) and IDH wild type groups (34 samples) and followed the same procedures outlined above to train the GLMnet models using either DMRs or DHMRs as inputs. These brain subtype models were then combined with the three-class model (brain cancer, liver cancer and control) based on Bayes’s theorem to expand the model for four samples groups (IDH WT and IDH mutant brain cancer, liver cancer, and control) (Fig. 5a). Using the four-sample class model, we calculated the prediction probability of each sample in the validation cohort. As shown in Fig. 5b, c, we could identify IDH mutant and IDH wild type brain tumor subtypes accurately, with the DMR+DHMR-based models having the best performance (AUROC of 0.947 (95% confidence interval, 0.88–1) and 0.955 (95% confidence interval, 0.9–1) for IDH mutant and IDH WT, respectively). Finally, the average probabilities of IDH mutation gliomas, IDH wild type gliomas, liver cancer and control groups were 0.55, 0.40, 0.72 and 0.74, respectively (Fig. 5d). Together, these studies indicate that models using both DMRs and DHMRs as inputs could also be used identify glioma subtypes accurately.
cfDNA DMRs are associated with genes whose gene expressions in tumor tissue samples predict patient survival
Promoter and enhancer DNA methylation is associated with gene transcription45,46. To probe the potential relationship between cfDNA DMRs and gene expression in tumor samples, we first annotated each of the liver cancer specific 10,051 cfDNA DMRs, which were identified by comparing cfDNA methylomes of all 58 liver cancer samples in the training cohort to those from control and brain tumor samples in the training cohorts, to their closest genes and identified 1689 genes whose promoters were within 20Kb of one of these DMRs. We then asked whether the expression of each of the 1689 genes in 371 liver tumor samples in the TCGA database was associated with patient survival (Fig. 6a). For instance, a hypo-methylated DMR at the SOX14 gene locus specific for liver cancer compared to controls and brain tumor samples was identified (Supplementary Fig. 8a). Furthermore, high expression of SOX14 in the 371 TCGA liver cancer dataset was associated with poor survival compared to lower expression (Supplementary Fig. 8b). Through this analysis, we found that of the 1689 genes with at least one liver cancer specific cfDNA DMR nearby, the expression of 150 genes in liver cancer tissues in the TCGA database was associated with patient survival. Of these 150 genes, 62 genes were associated with hyper-methylated cfDNA DMRs, whereas 88 genes were close to hypo-methylated cfDNA DMRs (Fig. 6b). Next, we asked whether the expression of these 150 genes could be used to cluster the 371 TCGA liver cancer patient samples using unsupervised clustering analysis and found that these 371 samples could be separated into two clusters. Interestingly, genes close to the hypo-methylated cfDNA DMRs are highly expressed in “Cluster 2” liver tumor samples compared to “Cluster 1” (Fig. 6c). In contrast, genes close to hyper-methylated cfDNA DMRs are highly expressed in “Cluster 1” patient samples. Importantly, patients in these two clusters showed dramatically different survival times, with the median survival of patients in Cluster 1 and Cluster 2 being ~80 and ~30 months, respectively (Fig. 6d).
We also applied the same approach and identified 37 genes with at least one brain tumor specific cfDNA DMR, and the expression of these genes in primary brain tumor tissue samples was associated with patient survival (Supplemental Fig. 8c, d). The expression of the 37 genes in tumor tissues could also separate 156 brain tumor samples from the TCGA database into two different clusters with patients in “Cluster 2” showing better survival than those in “Cluster 1” (Supplementary Fig. 8e–g). Interestingly, we noted that patient samples with IDH mutations were enriched in “Cluster 2” (Fisher test, OR=6.2, p=0.01). It is known that brain tumor patients with IDH mutations have a favorable outcome compared to glioma patients with wild type IDH gliomas47. Together, these studies indicate that some cfDNA DMRs for both liver and brain tumor patients are likely associated with changes in expression of nearby genes involved in tumorigenesis.
Discussion
DNA cytosine methylation plays an important role in gene regulation, chromatin maintenance and genomic stability25. Aberrant DNA methylation occurs in a variety of cancers. Therefore, DNA methylomes in cancer tissues have been used for tumor classification and detection. In this study, we developed ssg-MeDIP-seq procedures for analyzing genomic DNA methylomes as well as sscf-MeDIP-Seq for plasma cfDNA methylomes. These markedly simplified MeDIP-seq procedures greatly reduce the amount of DNA and time needed for the generation of MeDIP-seq datasets. Importantly, these methods allow us to analyze both symmetric DNA methylation as well as hemi-methylation, a recently-described epigenetic mark that has not been used for tumor detection27,28. Below, we discuss the implication of our findings in tumor detection and tumor classification.
To optimize and simplify MeDIP-seq procedures for analyzing DNA methylomes of genomic DNA isolated from normal or tumor tissues in a strand-specific manner, we utilized pA-Tn5 loaded with one adaptor, which tagments dsDNA into small fragments. Furthermore, because pA-Tn5 attaches the adaptor only to the 5’ end of each strand covalently, another adaptor is used to replace the first adaptor at the 3’ end following tagmentation, which makes it possible to detect DNA methylation at both Watson and Crick strands separately. In this way, we do not need the sonication step for shearing DNA into small fragments, which is the first step in previously published MeDIP-seq procedures48. Furthermore, because of tagmentation, MeDIP-seq libraries are generated through a simple PCR step. These modifications allow us to generate high-quality MeDIP-seq libraries from 100ng tumor DNA in less than two days. Importantly, ssg-MeDIP-Seq can measure both DNA methylation density and hemi-methylated regions at the same time.
DNA hemi-methylation was considered previously as the replication intermediate that will become fully methylated following DNA replication. Recently, it has been shown that about 10% of the ~3 million CpGs sites are hemi-methylated and are heritable, indicating that DNA hemi-methylation is a epigenetic mark27,28. We identified 190,106 and 228,575 HMR regions in 8 liver tumor samples and 8 adjacent controls tissues, respectively, which are about 10% of the ~2M potential methylated blocks38 used for analysis. We also identified 6864 differentially hemi-methylated regions (DHMRs) by comparing HMRs in liver tumor samples to those of adjacent non-tumor tissues. Importantly, we found that the majority of liver tumor DNA DHMRs do not overlap with DMRs in the same samples. The little overlap is likely due to the fact that DNA methylation density at liver tumor DMRs is altered equally at both strands, whereas DNA methylation density at DHMRs is altered markedly at one strand compared to the control samples. Together, these studies provide additional data to support the idea that HMRs are epigenetic marks and that DHMRs could be used as independent biomarkers.
Interestingly, we found that liver tumor DNA DHMRs are close to genes involved in metabolism. Previously, it has been shown that a fraction of HMRs is associated with the CTCF sites27,28, and furthermore, hemi-methylation at different strands has opposite effects on the binding of CTCF29. CTCF is important for genome organization. Therefore, it would be interesting to determine how DHMRs regulate gene expression of these genes involved in metabolism in liver cancer samples, and whether DHMRs in other tumor types are also close to genes involved in metabolism.
Tumor detection by methylation and hemi-methylation of plasma cell free DNA
In contrast to large dsDNA fragments for genomic DNA, cfDNAs are a mixture of dsDNA and ssDNA and damaged DNA, which are similar to ancient DNA samples. Therefore, we used our previous experience on the generation of sequencing libraries from ssDNA to optimize and developed sscf-MeDIP-Seq procedures49. Previously, it has been shown that single-stranded DNA library preparation for next generation sequencing will retain the information provided by dsDNA preparation methods, and at the same time analyze ssDNA molecules that are not included for analysis by dsDNA preparation methods36. Therefore, these modifications allowed us to include double stranded, single-stranded and damaged cfDNA for methylation analysis, and as such we could generate MeDIP-seq libraries from cfDNA purified from 300–500μl plasma samples. Importantly, the sscf-MeDIP-Seq method can measure cfDNA methylation on Watson and Crick strands separately, which makes it possible to analyze cfDNA hemi-methylation. Like liver tumor DNA DHMRs, most liver cancer cfDNA DHMRs do not overlap with cfDNA DMRs. These results indicate that liver cancer cfDNA DHMRs are independent biomarkers. Indeed, by analyzing cfDNA methylomes of 271 plasma samples from three groups of individuals, we found that machine-learning models trained with DMRs or DHMRs alone show good performance predicting sample groups in the validation cohort, and furthermore, models trained with DMRs+DHMRs show a slight improvement compared to the models trained with DMRs or DHMRs alone. The slight improvement is likely due to the fact that DMR- or DHMR-based models already show great performance, which makes it challenging to improve further with the sample size we had in the validation cohort. The AUROC is 0.978, 0.990, and 0.983 in predicting control, liver and brain cancer samples vs other in the validation cohort, respectively, for models trained with DMRs+DHMRs in our study. This performance is similar to, if not better than, what has been reported by analysis of cfDNA methylomes using other methods for tumor detection. For instance, it has been shown that AUROC for predicting lung cancer, acute myeloid leukaemia, pancreatic ductal adenocarcinoma and controls in the validation cohorts are 0. 971, 0.980, 0.918 and 0.969, respectively, through analysis of methylomes of cfDNA by an improved MeDIP-Seq20. In large cohort of studies on over 50 cancer types, the specificity for all cancer types is about 99.5% with 95% confidence intervals by analyzing DNA methylation using bisulfite sequencing21,22. In all these studies, DHMRs were not used for analysis and tumor detection.
In clinical settings, one could envision at least two benefits for tumor detection utilizing both DMRs and DHMRs. First, the slight improvement by the DMR+DHMR models will have a real benefit when a large number of samples is analyzed. Second and importantly, by evaluating samples using three different models (DMRs, DHMRs and DMRs+DHMRs), we could envision to reduce false positives during cancer screening, a major issue for early tumor detection using current assays. Because we could predict the same sample three times independently, we could in principle flag the sample for additional tests if predictions from three models show discordance. On the other hand, if the prediction from three models show concordance, this would increase confidence in the prediction. Therefore, we anticipate that tumor prediction using different models trained with independent features will reduce the false positive rate and increase prediction accuracy. In the future, it would be interesting to test these ideas using a large cohort of samples and in clinical settings.
Methods
This research project was approved by the Columbia University Institutional Review Board.
Biospecimens
Hepatocellular cancer patients’ samples were from an IRB-approved, hospital-based prospective study conducted at Columbia University Irving Medical Center (CUIMC) that recruited liver cancer patients (>18 years older) from Oct. 2008 to July 2014. Brain cancer patients’ samples were collected as part of an IRB-approved protocol to collect, bank and distribute de-identified samples from brain tumor patients at CUIMC. Subjects without cancer were recruited from advertisements around CUIMC also with IRB approval. All subjects provided blood samples which were rapidly centrifuged to obtain plasma which was aliquoted and frozen at −80°C until use. Basic epidemiologic variables were obtained by a structured questionnaire while clinical information on patients was obtained from medical records. Written informed consent was obtained from all participants. We did not perform analysis based on sex and gender in this study because we aimed developing models for tumor detection in a sex and gender independent way. However, because there are likely sex-and gender-specific DNA methylation patterns, we attempted to include equal number of female and male samples in this study. Sex and gender was based on self-report.
Protein, antibody and reagents
Purification of pA-Tn5 and pA-Tn5-oligo complex assembly used for analysis of methylation of tumor DNA were as described previously50,51. Antibodies against 5-mC (33D3) were purchased from Diagenode (C15200081). Mouse IgG used to bridge antibodies against 5-mC and pA-TN5 was purchased from Active Motif (53017), and tRNA was purchased from Sigma (R1753).
Preparation of genomic DNA
Genomic DNA was extracted from frozen tumor and adjacent tissues by standard proteinase K and RNase treatment followed by phenol and chloroform extraction.
Tagmentation of genomic DNA was performed as previously described with minor modifications50,51. In brief, 100ng of DNA and 1.5μl of pA-Tn5-AA complex were mixed in the Tagmentation buffer (5mM TAPS-NaOH pH8.5, 5mM MaCl2, 10% DMF), and were incubated in 37°C with gentle shaking for 30min. DNA was then purified by CHIP DNA clean kit (Zymo 11–379C), and oligo replacement and GAP repair followed the same procedures as described50,51. The processed DNA was then subjected to immunoprecipitation using antibodies against 5-mC described below.
Methylated DNA immunoprecipitation (MeDIP) and library preparation
The processed DNAs were diluted to 200μl with the binding buffer (50mM Tris pH 8, 350mM NaCl, 0.05%Triton X-100, 1mM EDTA), heated to 98°C for 10min, then cooled on ice immediately for 5min. 5μg tRNA (R1753 sigma) and 0.6μg anti-5-mC monoclonal antibody 33D3 (C15200081) were added to the mixture, rotated at 4°C for 1h. After addition of 1μl of bridge antibody (Active Motif 53017) and 10μl pre-washed Protein G beads (Invitrogen 10004D), the reaction mixtures were incubated at 4°C for 16h. After incubation, protein G beads were washed twice with the binding buffer, twice with wash buffer (50mM Tris pH8, 140mM NaCl, 0.05% Triton X-100, 1mM EDTA) and twice with TE buffer. DNA on the beads was eluted twice at 65°C for 15min with 15μl Elution buffer (10mM Tris-HCl, pH8.0, 10mM EDTA, 150mM NaCl, 5mM DTT, 1% SDS). Eluted DNAs were then combined and purified with CHIP DNA Clean & Concentrator (Zymo 11-379C). The purified DNAs were eluted in 20μl low EDTA TE (Swift 90296). For the genomic DNAs, Illumina Nextera Dual Index primers were used for library amplification. Briefly, PCR reactions consisting of 20μl eluted DNA, 1.5μl 10mM N7 primer, 1.5μl 10mM N5 primer, and 23μl 2X PCR master mix (NEB 0541S) were assembled for library preparation.
Plasma cell free DNA was purified by following the procedures of QIAGEN kits (QIAamp MinElute ccfDNA Mini Kit). To perform methylated DNA immunoprecipitation of plasma cfDNA samples, 1S Plus Set Indexing kits (Swift 16024) were used for sample indexing. Briefly, reaction mixtures consisting of 20 μl eluted DNA, 5μl R1, 25 μl 2X PCR master were assembled in PCR tubes and used for PCR amplification (98°C 1min: 98°C 10s, 63°C 20s, 10-11 cycles: 72°C, 1min). After PCR amplification, the reaction mixtures were mixed with 25μl AMPure XP beads (Beckman A63880) for 5min at room temperature. The supernatant was then transferred to a new tube with 25μl of AMPure XP beads. After a 10min incubation at room temperature, the DNA on beads were washed twice with 200 μl 80% ethanol, and eluted with 14μl low EDTA TE.
MeDIP-Seq data analysis
MeDIP-Seq libraries were sequenced using a paired-end method on Illumina Nextseq 500/550 or NOVA-seq platforms Adaptor sequences of all raw reads were removed by Cutadapt52 and reads <10 nt were removed. Reads passed through these cleanup steps were then mapped to the human reference genome (hg19) by Bowtie253. Duplicate reads were removed using Sambamba software54. Read coverage in a bin of 1bp was calculated from filtered bam files by deepTools255 and then normalized with total number of filtered reads into reads per million (RPM).
Protein coding gene annotation was downloaded from GENCODE (v28)56 and the CpG islands annotation was downloaded from UCSC Table Browser57. Protein coding genes were then classified into genes with and without CpG islands based on the overlap with their promoters ([−3kb, 3kb] surrounding TSS). Normalized reads density (RPM) of MeDIP-Seq was used to calculate from transcription start sites (TSS) to transcription termination sites (TTS) for each class of genes respectively by deepTools255.
Differentially methylated region (DMR) identification for genomic DNA and plasma cfDNA
Recently, it has been shown that 2,002,724 blocks each with at least four CpG dinucleotides can monitor DNA methylation from 205 tissues across multiple conditions38. Therefore, we used 2,002,724 blocks with at least four CpG dinucleotides to identify DMRs and DHMRs in our study. To identify DMR of genomic DNA of liver tumor tissues and adjacent non-tumor tissues, we compared ssg-MeDIP-seq datasets from eight liver tumor samples to each corresponding adjacent non-tumor tissues, and identified 11,930 hyper-methylated DMRs and 12,974 hypo-methylated DMRs by QSEA58 with a cutoff of p<0.01 and |log2 (fold change)|>1. For comparison, TCGA LIHC59 methylation datasets of Illumina 450K CpG array were downloaded. Methylation level β values of 50 liver tumor samples were compared with their corresponding adjacent samples, and 10,362 hypermethylated DMRs and 46,969 hypomethylated DMRs were identified using the T-test with a cutoff of Bonferroni adjusted p<0.05. To estimate the significance of overlaps between liver cancer DNA DMRs identified in this study and DMRs identified using TCGA datasets, 100 permutations were performed by bedtools60 with the command “bedtools shuffle –incl regulation.bed”. 5207 concordantly hyper-methylated and 1472 concordantly hypo-methylated DMRs between the liver cancer DMRs found in this study and TCGA liver tumor samples using 450K methylation arrays were identified. The observed number of overlapping DMR (5207 and 1472) was compared with the null hypothesis distribution generated from corresponding 100 permutations and standard normalized to Z score (P value is calculated by the null distribution in a one-sided way). Both of the concordant hyper-methylated and hypo-methylated DMRs showed significantly (p=0 for 5207 hyper-methylated DMRs and p=0 for 1472 hypo-methylated DMRs, respectively) higher enrichment than random permutation distributions (Fig. 1d).
To identify cfDNA DMRs, we compared cfDNAs sscf-MeDIP-Seq datasets from 10 liver cancer patients with those from 10 control individuals without cancer using QSEA58 with a cutoff of p<0.01 and |log2 (fold change)|>1. To evaluate whether these cfDNA DMRs were also found in liver tumor DNA DMRs, we first compared the overlap between cfDNA DMRs with tumor DNA DMRs identified using ssg-MeDIP-seq in this study. Similarly, discordantly methylated regions (hyper-methylated in one setting vs hypo-methylated in another setting) between cfDNA DMR and DMRs on tumor DNAs from this study were also compared. Subsequently, the observed numbers of concordantly and discordantly overlapped DMRs were compared with the null distributions generated from the corresponding 100 random permutations. The observed numbers of overlapping DMRs were normalized to Z score by null distributions and p values were calculated in a one-sided way to evaluate the significance of overlap.
Differentially hemi-methylated region (DHMR) identification for genomic DNA and plasma cfDNA
The same 2,002,724 blocks from Loyfer et al.38 were also used to identify hemi-methylated regions (HMRs) and differentially hemi-methylated regions (DHMR). Briefly, hemi-methylation level (HM) at each block was calculated as “bias” =
Machine learning models trained using DMRs, DHMRs, or DMRs+DHMRs
To detect and classify tumors by cfDNA methylomes, we used a regularized regression model of Lasso and Elastic-Net Regularized Generalized Linear Model (GLMnet) as the final model, and also tested two other machine learning models, Deep Neural Network (DNN) and Random Forest (RF). To evaluate the performance of machine learning models, the receiver operating characteristic (ROC) curve was plotted in sensitivity against (1-specificity) for each class, where sensitivity=(true positives/total positives) and specificity=(true negatives / total negatives). The area under ROC curve (AUROC) were calculated for comparison by “pROC” package in R61.
For GLMnet model, the elastic net penalty is controlled by α, bridging the gap between lasso regression (α=1) and ridge regression (α=0). In our study, α and λ were tuned over a grid of values to optimize the model from 0 to 1 in increments of 0.1, and the family function was set to “glmnet” for regression. For random forest model, the number of subset features for random selection was tuned over a grid of values by from 2 to the squared root of total number of features (DMRs or DHMRs), and 1000 trees were generated in each round. Model training was performed using 10-fold cross-validation and applied by “caret” package in R62. For the Deep Neural Network, our models consisted of an input layer, an output layer of three nodes for the predicted probability of each of the three types, and two hidden layers with 64 and 32 nodes. To process the input signals, the activation function for hidden layers were linear functions; and the output layer was the “softmax” function for multinomial classification. The L2 penalty was set to 0.1 for regularization of the hidden layers to reduce the risk of overfitting. We applied the DNN models by “Keras” package in R63.
cfDNA DMRs are associated with patient survival
We annotated cancer specific DMRs to their closest genes within 20kb. Then we downloaded the RNA expression in RSEM value (RNA-Seq by Expectation Maximization) and patient’s clinical data from TCGA-LIHC project (371 liver cancer patient samples) and TCGA-GBM project (156 brain cancer patients)59,64. Based on the median RNA expression of each of these nearby genes in these cohorts of patient samples, we separated liver or brain cancer samples into two groups, high and low expression. The cox proportional hazards model was performed for each of the nearby genes to evaluate hazard ratio on patients’ survival65. The genes whose expression in these cancer samples is associated with patient survival were chosen for further analysis.
Statistics & reproducibility
Statistical methods used for analysis were described in Figure legends. No statistical method was used to predetermine sample size. Furthermore, no data were excluded from the analyse. The experiments were not randomized, and the Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work is supported in part by a pilot grant from Herbert Irving Comprehensive Cancer Center at Columbia University supported by P30CA013696, by NIH grants (R35 GM118015, R01 NS132344-01 and R01 CA277605-01A1 (ZZ), and R35 CA253126 (RR)). Biospecimens were processed and stored in the Biomarkers Laboratory supported by P30 ES009089 and P30 CA013696. We thank Drs. Wei Li for discussion and suggestions for methylomes analysis and Dr. Zhonghua Liu for discussion of machine learning models. The brain tumor study is supported by William Rhodes and Louise Tilzer Rhodes Center for Glioblastoma. We thank the Herbert Irving Comprehensive Cancer Center Database Shared Resource as genome sequencing for providing (clinical data, biospecimens, or both) for the facilitation of this project.
Author contributions
H.Z. and Z.Z. designed MeDIP-Seq methods. H.Z. developed MeDIP-Seq methods and performed all MeDIP-Seq samples used in this study. X.H. performed all bioinformatic analysis of this study, with the guidance of R. R.H.W., R.M.S., and Z.Z. secured the initial funding for this project, and H.W. and R.M.S. provided plasma samples of liver cancer and healthy individuals as well as genomic DNA samples for liver cancer and adjacent regions with the help from J.M.G., J. F., C.K., J.B. and P.C. provided plasma samples from gliomas and discussions. X.H. and Z.Z. wrote the manuscript with the help of H.Z. All authors read and edit the manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Data availability
All sequencing and full datasets were deposited at dbGAP, with summary information being found at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003462). DbGAP study accession: phs003462.v1.p1. To access these datasets, one first need to apply for access to the DbGAP database from NIH. Once granted, please contact Dr. Zhang at [email protected] requesting access datasets generated in this study. No restrictions will be placed on non-profit organizations. We will approve the access as soon as we can, normally less than 6 business days. Source data are provided with this paper.
Code availability
The custom code was uploaded to github: https://github.com/clouds-drift/plasma_MCD.
Competing interests
A patent application describing the MeDIP-Seq procedures for analysis of DNA methylation of genomic DNA and cfDNA for tumor detection was filed by the Columbia University with Z.Z., H.Z., H.W., and R.M.S. listed as inventors. U.S. Provisional Application PCT/US2022/070998. Title: methods to analyze DNA methylomes in tumor and plasma cell free DNA. All other authors declare no conflict of interest.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Xu Hua, Hui Zhou.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-50471-1.
References
Articles from Nature Communications are provided here courtesy of Nature Publishing Group
Citations & impact
Impact metrics
Alternative metrics
Discover the attention surrounding your research
https://www.altmetric.com/details/165521776
Article citations
Neutrophil Extracellular Trap Scores Predict 90-Day Mortality in Hepatitis B-Related Acute-on-Chronic Liver Failure.
Biomedicines, 12(9):2048, 09 Sep 2024
Cited by: 0 articles | PMID: 39335563 | PMCID: PMC11429194
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Discovery and Validation of Methylation Signatures in Circulating Cell-Free DNA for the Detection of Colorectal Cancer.
Biomolecules, 14(8):996, 13 Aug 2024
Cited by: 0 articles | PMID: 39199384 | PMCID: PMC11353097
A new approach to epigenome-wide discovery of non-invasive methylation biomarkers for colorectal cancer screening in circulating cell-free DNA using pooled samples.
Clin Epigenetics, 10:53, 16 Apr 2018
Cited by: 26 articles | PMID: 29686738 | PMCID: PMC5902929
DNA methylation biomarkers for noninvasive detection of triple-negative breast cancer using liquid biopsy.
Int J Cancer, 152(5):1025-1035, 08 Nov 2022
Cited by: 13 articles | PMID: 36305646
Review
Comprehensive DNA methylation analysis of tissue of origin of plasma cell-free DNA by methylated CpG tandem amplification and sequencing (MCTA-Seq).
Clin Epigenetics, 11(1):93, 24 Jun 2019
Cited by: 30 articles | PMID: 31234922 | PMCID: PMC6591962
Funding
Funders who supported this work.
NCI NIH HHS (3)
Grant ID: R01 CA277605
Grant ID: R35 CA253126
Grant ID: P30 CA013696
NIEHS NIH HHS (1)
Grant ID: P30 ES009089
NIGMS NIH HHS (1)
Grant ID: R35 GM118015
NINDS NIH HHS (1)
Grant ID: R01 NS132344
U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health) (1)
Grant ID: GM118015, R01 NS132344-01, R01 CA277605-01A1, and R35 CA253126, P30CA013696
U.S. Department of Health & Human Services | NIH | Center for Information Technology (1)
Grant ID: GM118015, R01 NS132344-01, R01 CA277605-01A1, and R35 CA253126, P30CA013696