A Frustratingly Simple Decoding Method for Neural Text Generation
- URL: http://arxiv.org/abs/2305.12675v2
- Date: Tue, 27 Feb 2024 06:57:36 GMT
- Title: A Frustratingly Simple Decoding Method for Neural Text Generation
- Authors: Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi
- Abstract summary: We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD)
The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated.
Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date.
- Score: 96.10656449120165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a frustratingly simple, super efficient and surprisingly
effective decoding method, which we call Frustratingly Simple Decoding (FSD),
for neural text generation. The idea behind FSD is straightforward: we build an
anti-LM based on previously generated text and use this anti-LM to penalize
future generation of what has been generated. The anti-LM can be implemented as
simple as an n-gram language model or a vectorized variant. In this way, FSD
introduces no extra model parameters and negligible computational overhead (FSD
can be as fast as greedy search). Despite the simplicity, FSD is surprisingly
effective; Experiments show that FSD can outperform the canonical methods to
date (i.e., nucleus sampling) as well as several strong baselines that were
proposed recently.
Related papers
- DReSD: Dense Retrieval for Speculative Decoding [8.220217498103315]
Speculative decoding (SD) accelerates Large Language Model (LLM) generation by using an efficient draft model.
We focus on retrieval-based SD where the draft model retrieves the next tokens from a non-parametric datastore.
Dretrieval for Speculative Decoding (DReSD) is a novel framework that uses approximate nearest neighbour search with contextualised token embeddings.
arXiv Detail & Related papers (2025-02-21T16:32:28Z) - SAM Decoding: Speculative Decoding via Suffix Automaton [22.289906743980445]
This paper presents a novel retrieval-based speculative decoding method.
It adapts suffix automaton for efficient and accurate draft generation by utilizing common text corpus and dynamic text sequence.
Experiments on Spec-Bench show that our method is $18%+$ faster than other retrieval-based SD methods.
arXiv Detail & Related papers (2024-11-16T02:02:49Z) - FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation.
To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed.
We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z) - Hierarchical Skip Decoding for Efficient Autoregressive Text Generation [9.16858904192541]
We propose a novel decoding strategy named Hierarchical Skip Decoding (HSD) for efficient autoregressive text generation.
With almost half of the layers skipped, HSD can sustain 90% of the text quality compared to vanilla autoregressive decoding.
arXiv Detail & Related papers (2024-03-22T02:44:05Z) - Diffusion Language Models Generation Can Be Halted Early [4.726777092009553]
Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation.
One of the ways to reduce the performance gap between these two types of language models is to speed up the generation of DLMs.
We propose a novel methodology to address this issue in this work. It enables the execution of more generation steps within a given time frame.
arXiv Detail & Related papers (2023-05-18T08:56:05Z) - Memorization for Good: Encryption with Autoregressive Language Models [8.645826579841692]
We propose the first symmetric encryption algorithm with autoregressive language models (SELM)
We show that autoregressive LMs can encode arbitrary data into a compact real-valued vector (i.e., encryption) and then losslessly decode the vector to the original message (i.e. decryption) via random subspace optimization and greedy decoding.
arXiv Detail & Related papers (2023-05-15T05:42:34Z) - DiffusionRet: Generative Text-Video Retrieval with Diffusion Model [56.03464169048182]
Existing text-video retrieval solutions focus on maximizing the conditional likelihood, i.e., p(candidates|query)
We creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query)
This is accomplished through a diffusion-based text-video retrieval framework (DiffusionRet), which models the retrieval task as a process of gradually generating joint distribution from noise.
arXiv Detail & Related papers (2023-03-17T10:07:19Z) - Contrastive Decoding: Open-ended Text Generation as Optimization [153.35961722855686]
We propose contrastive decoding (CD), a reliable decoding approach.
It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs.
CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
arXiv Detail & Related papers (2022-10-27T00:58:21Z) - RaP: Redundancy-aware Video-language Pre-training for Text-Video
Retrieval [61.77760317554826]
We propose Redundancy-aware Video-language Pre-training.
We design a redundancy measurement of video patches and text tokens by calculating the cross-modal minimum dis-similarity.
We evaluate our method on four benchmark datasets, MSRVTT, MSVD, DiDeMo, and LSMDC.
arXiv Detail & Related papers (2022-10-13T10:11:41Z) - An Empirical Study of Language Model Integration for Transducer based
Speech Recognition [23.759084092602517]
Methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method.
We propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR.
arXiv Detail & Related papers (2022-03-31T03:33:50Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.