Knowledge Distillation Thrives on Data Augmentation

Wang, Huan; Lohit, Suhas; Jones, Michael; Fu, Yun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2012.02909v1 (cs)

[Submitted on 5 Dec 2020 (this version), latest version 21 Feb 2023 (v3)]

Title:Knowledge Distillation Thrives on Data Augmentation

Authors:Huan Wang, Suhas Lohit, Michael Jones, Yun Fu

View PDF

Abstract:Knowledge distillation (KD) is a general deep neural network training framework that uses a teacher model to guide a student model. Many works have explored the rationale for its success, however, its interplay with data augmentation (DA) has not been well recognized so far. In this paper, we are motivated by an interesting observation in classification: KD loss can benefit from extended training iterations while the cross-entropy loss does not. We show this disparity arises because of data augmentation: KD loss can tap into the extra information from different input views brought by DA. By this explanation, we propose to enhance KD via a stronger data augmentation scheme (e.g., mixup, CutMix). Furthermore, an even stronger new DA approach is developed specifically for KD based on the idea of active learning. The findings and merits of the proposed method are validated by extensive experiments on CIFAR-100, Tiny ImageNet, and ImageNet datasets. We can achieve improved performance simply by using the original KD loss combined with stronger augmentation schemes, compared to existing state-of-the-art methods, which employ more advanced distillation losses. In addition, when our approaches are combined with more advanced distillation losses, we can advance the state-of-the-art performance even more. On top of the encouraging performance, this paper also sheds some light on explaining the success of knowledge distillation. The discovered interplay between KD and DA may inspire more advanced KD algorithms.

Comments:	Code will be updated soon
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2012.02909 [cs.CV]
	(or arXiv:2012.02909v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2012.02909

Submission history

From: Huan Wang [view email]
[v1] Sat, 5 Dec 2020 00:32:04 UTC (1,774 KB)
[v2] Tue, 18 Oct 2022 23:20:41 UTC (9,797 KB)
[v3] Tue, 21 Feb 2023 21:59:57 UTC (4,451 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Knowledge Distillation Thrives on Data Augmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Knowledge Distillation Thrives on Data Augmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators