REST: Retrieval-Based Speculative Decoding

He, Zhenyu; Zhong, Zexuan; Cai, Tianle; Lee, Jason D.; He, Di

Computer Science > Computation and Language

arXiv:2311.08252 (cs)

[Submitted on 14 Nov 2023 (v1), last revised 4 Apr 2024 (this version, v2)]

Title:REST: Retrieval-Based Speculative Decoding

Authors:Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

View PDF HTML (experimental)

Abstract:We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing relevant tokens based on the current context. Its plug-and-play nature allows for seamless integration and acceleration of any language models, all without necessitating additional training. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation. The code of REST is available at this https URL.

Comments:	NAACL 2024, camera ready
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2311.08252 [cs.CL]
	(or arXiv:2311.08252v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.08252

Submission history

From: Zhenyu He [view email]
[v1] Tue, 14 Nov 2023 15:43:47 UTC (317 KB)
[v2] Thu, 4 Apr 2024 11:37:01 UTC (456 KB)

Computer Science > Computation and Language

Title:REST: Retrieval-Based Speculative Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:REST: Retrieval-Based Speculative Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators