Sample Efficiency of Data Augmentation Consistency Regularization

Yang, Shuo; Dong, Yijun; Ward, Rachel; Dhillon, Inderjit S.; Sanghavi, Sujay; Lei, Qi

Computer Science > Machine Learning

arXiv:2202.12230 (cs)

[Submitted on 24 Feb 2022 (v1), last revised 16 Jun 2022 (this version, v2)]

Title:Sample Efficiency of Data Augmentation Consistency Regularization

Authors:Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S. Dhillon, Sujay Sanghavi, Qi Lei

View PDF

Abstract:Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data. In this paper, we take a step in this direction - we first present a simple and novel analysis for linear regression with label invariant augmentations, demonstrating that data augmentation consistency (DAC) is intrinsically more efficient than empirical risk minimization on augmented data (DA-ERM). The analysis is then extended to misspecified augmentations (i.e., augmentations that change the labels), which again demonstrates the merit of DAC over DA-ERM. Further, we extend our analysis to non-linear models (e.g., neural networks) and present generalization bounds. Finally, we perform experiments that make a clean and apples-to-apples comparison (i.e., with no extra modeling or data tweaks) between DAC and DA-ERM using CIFAR-100 and WideResNet; these together demonstrate the superior efficacy of DAC.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.12230 [cs.LG]
	(or arXiv:2202.12230v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.12230

Submission history

From: Shuo Yang [view email]
[v1] Thu, 24 Feb 2022 17:50:31 UTC (309 KB)
[v2] Thu, 16 Jun 2022 04:26:27 UTC (307 KB)

Computer Science > Machine Learning

Title:Sample Efficiency of Data Augmentation Consistency Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sample Efficiency of Data Augmentation Consistency Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators