Comparisons between Three Cross-Validation Methods for Measuring Learners' Performances
Abstract
Background: Measuring the performance of a classifier is crucial when trying to find the best machine-learning algorithm with optimal parameters. Multiple methods are available in this regard and the more common is the k-fold cross-validation. Other similar methods are the bootstrap method and the k-fold repeated cross-validation.
Objective: This paper compares three such methods, namely the k-fold cross-validation, the k-fold repeated cross-validation, and the bootstrap method. The latter two have regarding our experimental set-up a 20-fold increase in computational effort. The objective of this paper was to experimentally find the best cross-validation method regarding both its accuracy and its computational effort.
Methods: Four classification algorithms were selected and applied on multiple datasets within the field of Life Sciences (n=35) using all three selected cross-validation methods. We used the pairwise dependent Student's T-Test with the standard 95% confidence interval for statistical comparisons.
Results: The results of the statistical comparisons between the cross-validation methods were as follows. Despite 20 times less computational effort, the k-fold cross-validation method was statistically considered equal to the k-fold repeated cross-validation. The third method, the bootstrap method, was considered to be too pessimistic and therefore inferior to the other two selected methods.
Conclusion: The k-fold cross-validation was proved to be the best choice between the selected cross-validation methods both regarding its accuracy and its computational effort.