Related papers: REST: Retrieval-Based Speculative Decoding

REST: Retrieval-Based Speculative Decoding

URL: http://arxiv.org/abs/2311.08252v2
Date: Thu, 4 Apr 2024 11:37:01 GMT
Title: REST: Retrieval-Based Speculative Decoding
Authors: Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He,
Abstract summary: We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation.
Score: 69.06115086237207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing relevant tokens based on the current context. Its plug-and-play nature allows for seamless integration and acceleration of any language models, all without necessitating additional training. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation. The code of REST is available at https://github.com/FasterDecoding/REST.

Related papers

Test Amplification for REST APIs Using "Out-of-the-box" Large Language Models [1.8024397171920885]
We report our experience with usingChatGPT and GitHub's Copilot to amplify REST API test suites. We derive a series of guidelines and lessons learned concerning the prompts that result in the strongest test suite.
arXiv Detail & Related papers (2025-03-13T12:30:14Z)
Tokenization as Finite-State Transduction [24.19959327497118]
We introduce a finite-state framework which can efficiently encode all possible tokenizations of a regular language. We show that Byte-Pair. Match (BPE) and MaxPiece (WordPiece) fit within this framework. An application of this is to guided generation, where the outputs of a language model are constrained to match some pattern.
arXiv Detail & Related papers (2024-10-21T07:10:07Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z)
Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z)
CAT-LM: Training Language Models on Aligned Code And Tests [19.526181671936243]
Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected. We propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects.
arXiv Detail & Related papers (2023-10-02T19:52:22Z)
Collocation2Text: Controllable Text Generation from Guide Phrases in Russian [0.0]
Collocation2Text is a plug-and-play method for automatic controllable text generation in Russian. The method is based on two interacting models: the autoregressive language ruGPT-3 model and the autoencoding language ruRoBERTa model. Experiments on generating news articles using the proposed method showed its effectiveness for automatically generated fluent texts.
arXiv Detail & Related papers (2022-06-18T17:10:08Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
Controllable Generation from Pre-trained Language Models via Inverse Prompting [47.23315683944257]
We propose an innovative method, inverse prompting, to better control text generation. Inverse prompting uses generated text to inversely predict the prompt during beam search. Our results show that our proposed method substantially outperforms the baselines.
arXiv Detail & Related papers (2021-03-19T08:36:52Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.