Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes.

Zhao L; Zhao H; Yan H

doi:10.1186/s12885-018-4546-8

Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes.

Zhao L ¹,

Zhao H ¹,

Yan H ¹

Affiliations

1. Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong.
Authors
Zhao L¹
Zhao H¹
Yan H¹
(3 authors)

ORCIDs linked to this article

Zhao L | 0000-0003-2681-7695

BMC Cancer, 29 May 2018, 18(1):603
https://doi.org/10.1186/s12885-018-4546-8 PMID: 29843660 PMCID: PMC5975421

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

Background

Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer related death in the world with a five-year survival rate of less than 5%. Not all PDAC are the same, because there exist intra-tumoral heterogeneity between PDAC, which poses a great challenge to personalized treatments for PDAC.

Methods

To dissect the molecular heterogeneity of PDAC, we performed a retrospective meta-analysis on whole transcriptome data from more than 1200 PDAC patients. Subtypes were identified based on non-negative matrix factorization (NMF) biclustering method. We used the gene set enrichment analysis (GSEA) and survival analysis to conduct the molecular and clinical characterization of the identified subtypes, respectively.

Results

Six molecular and clinical distinct subtypes of PDAC: L1-L6, are identified and grouped into tumor-specific (L1, L2 and L6) and stroma-specific subtypes (L3, L4 and L5). For tumor-specific subtypes, L1 (~ 22%) has enriched carbohydrate metabolism-related gene sets and has intermediate survival. L2 (~ 22%) has the worst clinical outcomes, and is enriched for cell proliferation-related gene sets. About 23% patients can be classified into L6, which leads to intermediate survival and is enriched for lipid and protein metabolism-related gene sets. Stroma-specific subtypes may contain high non-epithelial contents such as collagen, immune and islet cells, respectively. For instance, L3 (~ 12%) has poor survival and is enriched for collagen-associated gene sets. L4 (~ 14%) is enriched for various immune-related gene sets and has relatively good survival. And L5 (~ 7%) has good clinical outcomes and is enriched for neurotransmitter and insulin secretion related gene sets. In the meantime, we identified 160 subtype-specific markers and built a deep learning-based classifier for PDAC. We also applied our classification system on validation datasets and observed much similar molecular and clinical characteristics between subtypes.

Conclusions

Our study is the largest cohort of PDAC gene expression profiles investigated so far, which greatly increased the statistical power and provided more robust results. We identified six molecular and clinical distinct subtypes to describe a more complete picture of the PDAC heterogeneity. The 160 subtype-specific markers and a deep learning based classification system may be used to better stratify PDAC patients for personalized treatments.

Free full text

BMC Cancer. 2018; 18: 603.

Published online 2018 May 29. https://doi.org/10.1186/s12885-018-4546-8

PMCID: PMC5975421

PMID: 29843660

Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes

Lan Zhao, Hongya Zhao, and Hong Yan

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Associated Data

Supplementary Materials: Additional file 1: Figure S1. PCA before and after batch effect correction for training and validation datasets via ComBat. (a) PCA on training dataset (n=796) prior to batch effect correction. (b) PCA on training dataset (n=796) after batch effect correction. (c) PCA on validation dataset (n=472) prior to batch effect correction. (b) PCA on validation dataset (n=472) after batch effect correction. (PDF 157 kb)
12885_2018_4546_MOESM1_ESM.pdf (157K)
Additional file 2: Table S1. Clinical data with patient characteristics and statistical associations of six subtypes with clinical outcome. (DOCX 17 kb)
12885_2018_4546_MOESM2_ESM.docx (18K)
Additional file 3: Figure S2. Heatmap of consensus matrices from 30 runs for each rank (2 to 10) on the training dataset. (PDF 12 kb)
12885_2018_4546_MOESM3_ESM.pdf (13K)
Additional file 4: Table S2. Confusion Matrices in internal training and validation sets. (XLS 25 kb)
12885_2018_4546_MOESM4_ESM.xls (25K)
Additional file 5: Figure S3. Boxplots showing mean gene expression patterns of some interesting biomarkers between six subtypes (L1 gene list: ALDOB, CA2, NPC1L1 and PGC. L2 gene list: CCNB2, CDKN2A, SFN, UBE2C, SPRR3, DHRS9 and CRABP2. L3 gene list: GREM1, MFAP5, COL12A1, COL10A1 and COL8A1. L4 gene list: CCL, CCR7 and CD gene families. L5 gene list: PAX6, IAPP, G6PC2, ABCC8 and ZBTB16. L6 gene list: CLPS, PLA2G1B, CEL, ALB, CPA1, CPB1, CTRL, SLC3A1, PRSS3 and ANPEP). X-axis: six subtypes, y-axis: gene expression values. Paired t-test was used to determine whether there were statistically significant differences in mean gene expression between subtypes, results show that all six comparisons are significant (p-value <2.2e-16). (PDF 62 kb)
12885_2018_4546_MOESM5_ESM.pdf (63K)
Additional file 6: Table S3. Significantly enriched gene sets for each subtype. (XLS 883 kb)
12885_2018_4546_MOESM6_ESM.xls (884K)

Data Availability Statement: All data used in the study can be downloaded from multiple data repositories, including the International Cancer Genome Consortium (ICGC, www.icgc.org), the Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/), Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/).

Go to: