Investigating Label Bias in Beam Search for Open-ended Text Generation
- URL: http://arxiv.org/abs/2005.11009v1
- Date: Fri, 22 May 2020 05:17:53 GMT
- Title: Investigating Label Bias in Beam Search for Open-ended Text Generation
- Authors: Liang Wang, Jinlong Liu, Jingming Liu
- Abstract summary: In open-ended text generation, beam search is often found to produce repetitive and generic texts.
Standard seq2seq models suffer from label bias due to its locally normalized probability formulation.
By combining locally normalized maximum likelihood estimation and globally normalized sequence-level training, label bias can be reduced with almost no sacrifice in perplexity.
- Score: 8.331919991368366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Beam search is an effective and widely used decoding algorithm in many
sequence-to-sequence (seq2seq) text generation tasks. However, in open-ended
text generation, beam search is often found to produce repetitive and generic
texts, sampling-based decoding algorithms like top-k sampling and nucleus
sampling are more preferred. Standard seq2seq models suffer from label bias due
to its locally normalized probability formulation. This paper provides a series
of empirical evidence that label bias is a major reason for such degenerate
behaviors of beam search. By combining locally normalized maximum likelihood
estimation and globally normalized sequence-level training, label bias can be
reduced with almost no sacrifice in perplexity. To quantitatively measure label
bias, we test the model's ability to discriminate the groundtruth text and a
set of context-agnostic distractors. We conduct experiments on large-scale
response generation datasets. Results show that beam search can produce more
diverse and meaningful texts with our approach, in terms of both automatic and
human evaluation metrics. Our analysis also suggests several future working
directions towards the grand challenge of open-ended text generation.
Related papers
- On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Hard Nominal Example-aware Template Mutual Matching for Industrial
Anomaly Detection [74.9262846410559]
textbfHard Nominal textbfExample-aware textbfTemplate textbfMutual textbfMatching (HETMM)
textitHETMM aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - A Call for Clarity in Beam Search: How It Works and When It Stops [125.55175954381991]
We introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search.
Empirical results demonstrate that adjusting this patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs.
arXiv Detail & Related papers (2022-04-11T22:03:44Z) - Massive-scale Decoding for Text Generation using Lattices [34.2658286826597]
We present a search algorithm to construct lattices encoding a massive number of generation options.
We show that our algorithm encodes hundreds to thousands of diverse options that remain grammatical and high-quality into one linear-sized lattice.
arXiv Detail & Related papers (2021-12-14T18:56:11Z) - Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models.
In use-cases that call for multiple solutions, a diverse or representative set is often desired.
By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z) - A Token-level Reference-free Hallucination Detection Benchmark for
Free-form Text Generation [50.55448707570669]
We propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDes.
To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations.
arXiv Detail & Related papers (2021-04-18T04:09:48Z) - Controlling Hallucinations at Word Level in Data-to-Text Generation [10.59137381324694]
State-of-art neural models include misleading statements in their outputs.
We propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance.
Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts.
arXiv Detail & Related papers (2021-02-04T18:58:28Z) - If beam search is the answer, what was the question? [78.71330480725668]
We find that beam search enforces uniform information density in text, a property motivated by cognitive science.
We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models.
arXiv Detail & Related papers (2020-10-06T11:57:03Z) - Mirostat: A Neural Text Decoding Algorithm that Directly Controls
Perplexity [22.15683400807154]
We use a theoretical analysis of perplexity in top-k, top-p, and temperature sampling to design a feedback-based adaptive top-k text decoding algorithm called mirostat.
Experiments show that for low values of k and p in top-k and top-p sampling, perplexity drops significantly with generated text length.
For large values of k and p, perplexity increases with generated text length, which is correlated with incoherence in the text.
arXiv Detail & Related papers (2020-07-29T17:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.