Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon; Joel Jang; Sungdong Kim; Minjoon Seo

doi:10.18653/v1/2023.acl-short.74

Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo

Abstract

In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.

Anthology ID:: 2023.acl-short.74
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 851–864
Language:
URL:: https://aclanthology.org/2023.acl-short.74
DOI:: 10.18653/v1/2023.acl-short.74
Bibkey:
Cite (ACL):: Dongkeun Yoon, Joel Jang, Sungdong Kim, and Minjoon Seo. 2023. Gradient Ascent Post-training Enhances Language Model Generalization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 851–864, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Gradient Ascent Post-training Enhances Language Model Generalization (Yoon et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-short.74.pdf
Video:: https://aclanthology.org/2023.acl-short.74.mp4

PDF Cite Search Video