Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo


Abstract
In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.
Anthology ID:
2023.acl-short.74
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
851–864
Language:
URL:
https://aclanthology.org/2023.acl-short.74
DOI:
10.18653/v1/2023.acl-short.74
Bibkey:
Cite (ACL):
Dongkeun Yoon, Joel Jang, Sungdong Kim, and Minjoon Seo. 2023. Gradient Ascent Post-training Enhances Language Model Generalization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 851–864, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Gradient Ascent Post-training Enhances Language Model Generalization (Yoon et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.74.pdf
Video:
 https://aclanthology.org/2023.acl-short.74.mp4