The need to approximate the use-case in clinical machine learning

Sohrab Saeb; Luca Lonini; Arun Jayaraman; David C Mohr; Konrad P Kording

doi:10.1093/gigascience/gix019

The need to approximate the use-case in clinical machine learning

Gigascience. 2017 May 1;6(5):1-9. doi: 10.1093/gigascience/gix019.

Authors

Sohrab Saeb^{1

2}, Luca Lonini^{2

3}, Arun Jayaraman^{2

3}, David C Mohr¹, Konrad P Kording²

Affiliations

¹ Department of Preventive Medicine, Northwestern University, 10th floor, Rubloff Bldg, 750 N Lake Shore Dr, Chicago, IL 60611, USA.
² Department of Physical Medicine and Rehabilitation, Northwestern University, 345 E Superior St, Suite 1479, Chicago, IL 60611, USA.
³ Shirley Ryan Ability Lab, Room 1401, 11th Floor, 355 E Erie St., Chicago, IL 60611, USA.

Abstract

The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.

Keywords: Machine learning; clinical outcomes; cross-validation; diagnosis; prediction accuracy; rehabilitation outcomes; smartphones; wearable technology.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Accelerometry
Algorithms
Exercise
Humans
Machine Learning*
Monitoring, Ambulatory / methods*
Reproducibility of Results
Smartphone
Wearable Electronic Devices*

Abstract

Publication types

MeSH terms

Grants and funding