Speculative Contrastive Decoding
- URL: http://arxiv.org/abs/2311.08981v2
- Date: Wed, 13 Mar 2024 17:32:50 GMT
- Title: Speculative Contrastive Decoding
- Authors: Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou
- Abstract summary: Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
- Score: 55.378200871224074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models~(LLMs) exhibit exceptional performance in language
tasks, yet their auto-regressive inference is limited due to high computational
requirements and is sub-optimal due to the exposure bias. Inspired by
speculative decoding and contrastive decoding, we introduce Speculative
Contrastive Decoding~(SCD), a straightforward yet powerful decoding approach
that leverages predictions from smaller language models~(LMs) to achieve both
decoding acceleration and quality improvement. Extensive evaluations and
analyses on four diverse language tasks demonstrate the effectiveness of SCD,
showing that decoding efficiency and quality can compatibly benefit from one
smaller LM.
Related papers
- E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning [20.660297311025417]
We introduce E2LLM (Encodergated Large Language Models), a novel approach that effectively navigates the "impossible triangle"
The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM.
Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models.
arXiv Detail & Related papers (2024-09-10T17:44:35Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - ContraCLM: Contrastive Learning For Causal Language Model [54.828635613501376]
We present ContraCLM, a novel contrastive learning framework at both token-level and sequence-level.
We show that ContraCLM enhances discrimination of the representations and bridges the gap with the encoder-only models.
arXiv Detail & Related papers (2022-10-03T18:56:35Z) - Language-specific Characteristic Assistance for Code-switching Speech
Recognition [42.32330582682405]
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition.
Existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs.
We propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems.
arXiv Detail & Related papers (2022-06-29T13:39:51Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.