Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.

References 


Articles referenced by this article (17)


Show 7 more references (10 of 17)

Citations & impact 


Impact metrics

Jump to Citations

Alternative metrics

Altmetric item for https://www.altmetric.com/details/127873531
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/127873531

Article citations

Similar Articles 


To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.

Funding 


Funders who supported this work.

HRSA HHS (1)

Health Resources and Services Administration (2)

NCI NIH HHS (1)

NHGRI NIH HHS (1)

NIDDK NIH HHS (1)

National Institutes of Health (3)