A Call for Clarity in Beam Search: How It Works and When It Stops
- URL: http://arxiv.org/abs/2204.05424v3
- Date: Wed, 28 Feb 2024 08:03:28 GMT
- Title: A Call for Clarity in Beam Search: How It Works and When It Stops
- Authors: Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin
Choi, and Noah A. Smith
- Abstract summary: We introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search.
Empirical results demonstrate that adjusting this patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs.
- Score: 125.55175954381991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text generation with beam search has proven successful in a wide range of
applications. We point out that, though largely overlooked in the literature,
the commonly-used implementation of beam decoding (e.g., Hugging Face
Transformers and fairseq) uses a first come, first served heuristic: it keeps a
set of already completed sequences over time steps and stops when the size of
this set reaches the beam size. Based on this finding, we introduce a patience
factor, a simple modification to this beam decoding implementation, that
generalizes the stopping criterion and provides flexibility to the depth of
search. Empirical results demonstrate that adjusting this patience factor
improves decoding performance of strong pretrained models on news text
summarization and machine translation over diverse language pairs, with a
negligible inference slowdown. Our approach only modifies one line of code and
can be thus readily incorporated in any implementation. Further, we find that
different versions of beam decoding result in large performance differences in
summarization, demonstrating the need for clarity in specifying the beam search
implementation in research work. Our code will be available upon publication.
Related papers
- Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference [35.730941605490194]
Large language models (LLMs) have shown outstanding performance across numerous real-world tasks.
Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens.
This paper explores the novel integration of speculative decoding with beam sampling.
arXiv Detail & Related papers (2024-09-25T02:20:42Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Improved Beam Search for Hallucination Mitigation in Abstractive
Summarization [1.2328446298523066]
In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation.
We propose an NLI-assisted beam re-ranking mechanism by computing entailment probability scores between the input context and summarization model-generated beams.
Our proposed algorithm significantly outperforms vanilla beam decoding on XSum and CNN/DM datasets.
arXiv Detail & Related papers (2022-12-06T02:33:47Z) - Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models.
In use-cases that call for multiple solutions, a diverse or representative set is often desired.
By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - If beam search is the answer, what was the question? [78.71330480725668]
We find that beam search enforces uniform information density in text, a property motivated by cognitive science.
We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models.
arXiv Detail & Related papers (2020-10-06T11:57:03Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z) - Investigating Label Bias in Beam Search for Open-ended Text Generation [8.331919991368366]
In open-ended text generation, beam search is often found to produce repetitive and generic texts.
Standard seq2seq models suffer from label bias due to its locally normalized probability formulation.
By combining locally normalized maximum likelihood estimation and globally normalized sequence-level training, label bias can be reduced with almost no sacrifice in perplexity.
arXiv Detail & Related papers (2020-05-22T05:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.