Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Zhao, Yanjun; Dang, Sizhe; Ye, Haishan; Dai, Guang; Qian, Yi; Tsang, Ivor W.

Computer Science > Machine Learning

arXiv:2402.15173 (cs)

[Submitted on 23 Feb 2024 (v1), last revised 31 Aug 2024 (this version, v2)]

Title:Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Authors:Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W.Tsang

View PDF HTML (experimental)

Abstract:Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we propose HiZOO, a diagonal Hessian informed zeroth-order optimizer which is the first work to leverage the diagonal Hessian to enhance zeroth-order optimizer for fine-tuning LLMs. What's more, HiZOO avoids the expensive memory cost and only increases one forward pass per step. Extensive experiments on various models (350M~66B parameters) indicate that HiZOO improves model convergence, significantly reducing training steps and effectively enhancing model accuracy. Moreover, we visualize the optimization trajectories of HiZOO on test functions, illustrating its effectiveness in handling heterogeneous curvatures. Lastly, we provide theoretical proofs of convergence for HiZOO. Code is publicly available at this https URL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.15173 [cs.LG]
	(or arXiv:2402.15173v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.15173

Submission history

From: Yanjun Zhao [view email]
[v1] Fri, 23 Feb 2024 08:11:55 UTC (11,099 KB)
[v2] Sat, 31 Aug 2024 15:36:32 UTC (26,236 KB)

Computer Science > Machine Learning

Title:Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators