Testing calibration of risk models at extremes of disease risk

Minsun Song; Peter Kraft; Amit D Joshi; Myrto Barrdahl; Nilanjan Chatterjee

doi:10.1093/biostatistics/kxu034

Testing calibration of risk models at extremes of disease risk

Biostatistics. 2015 Jan;16(1):143-54. doi: 10.1093/biostatistics/kxu034. Epub 2014 Jul 14.

Authors

Minsun Song¹, Peter Kraft², Amit D Joshi², Myrto Barrdahl³, Nilanjan Chatterjee⁴

Affiliations

¹ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA.
² Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
³ Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.
⁴ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA [email protected].

Abstract

Risk-prediction models need careful calibration to ensure they produce unbiased estimates of risk for subjects in the underlying population given their risk-factor profiles. As subjects with extreme high or low risk may be the most affected by knowledge of their risk estimates, checking the adequacy of risk models at the extremes of risk is very important for clinical applications. We propose a new approach to test model calibration targeted toward extremes of disease risk distribution where standard goodness-of-fit tests may lack power due to sparseness of data. We construct a test statistic based on model residuals summed over only those individuals who pass high and/or low risk thresholds and then maximize the test statistic over different risk thresholds. We derive an asymptotic distribution for the max-test statistic based on analytic derivation of the variance-covariance function of the underlying Gaussian process. The method is applied to a large case-control study of breast cancer to examine joint effects of common single nucleotide polymorphisms (SNPs) discovered through recent genome-wide association studies. The analysis clearly indicates a non-additive effect of the SNPs on the scale of absolute risk, but an excellent fit for the linear-logistic model even at the extremes of risks.

Keywords: Case–control studies; Gene–gene and gene–environment interactions; Genome-wide association studies; Goodness-of-fit tests; Polygenic score; Risk stratification.

Published by Oxford University Press 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Breast Neoplasms / genetics
Calibration
Genetic Predisposition to Disease*
Genome-Wide Association Study / statistics & numerical data*
Humans
Models, Genetic*
Models, Statistical*
Risk Assessment

Abstract

Publication types

MeSH terms

Grants and funding