Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Liu, Peiyu; Gao, Ze-Feng; Chen, Yushuo; Zhao, Wayne Xin; Wen, Ji-Rong

Computer Science > Computation and Language

arXiv:2303.16753 (cs)

[Submitted on 27 Mar 2023 (v1), last revised 11 Apr 2023 (this version, v2)]

Title:Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Authors:Peiyu Liu, Ze-Feng Gao, Yushuo Chen, Wayne Xin Zhao, Ji-Rong Wen

View PDF

Abstract:In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

Comments:	14 pages, 4 figures, 6 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2303.16753 [cs.CL]
	(or arXiv:2303.16753v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.16753

Submission history

From: Peiyu Liu [view email]
[v1] Mon, 27 Mar 2023 02:34:09 UTC (8,594 KB)
[v2] Tue, 11 Apr 2023 02:45:10 UTC (8,594 KB)

Computer Science > Computation and Language

Title:Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators