Related papers: On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

URL: http://arxiv.org/abs/2411.09642v1
Date: Thu, 14 Nov 2024 18:06:55 GMT
Title: On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse
Authors: Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas,
Abstract summary: Given samples from an unknown language, a language model should produce valid strings not seen in training. Otherwise, outputting invalid strings constitutes "hallucination," and failing to capture the full range leads to "mode collapse" We investigate this within a statistical language generation setting building on Gold and Angluin.
Score: 16.30681257128492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Specifying all desirable properties of a language model is challenging, but certain requirements seem essential. Given samples from an unknown language, the trained model should produce valid strings not seen in training and be expressive enough to capture the language's full richness. Otherwise, outputting invalid strings constitutes "hallucination," and failing to capture the full range leads to "mode collapse." We ask if a language model can meet both requirements. We investigate this within a statistical language generation setting building on Gold and Angluin. Here, the model receives random samples from a distribution over an unknown language K, which belongs to a possibly infinite collection of languages. The goal is to generate unseen strings from K. We say the model generates from K with consistency and breadth if, as training size increases, its output converges to all unseen strings in K. Kleinberg and Mullainathan [KM24] asked if consistency and breadth in language generation are possible. We answer this negatively: for a large class of language models, including next-token prediction models, this is impossible for most collections of candidate languages. This contrasts with [KM24]'s result, showing consistent generation without breadth is possible for any countable collection of languages. Our finding highlights that generation with breadth fundamentally differs from generation without breadth. As a byproduct, we establish near-tight bounds on the number of samples needed for generation with or without breadth. Finally, our results offer hope: consistent generation with breadth is achievable for any countable collection of languages when negative examples (strings outside K) are available alongside positive ones. This suggests that post-training feedback, which encodes negative examples, can be crucial in reducing hallucinations while limiting mode collapse.

Related papers

Density Measures for Language Generation [2.2872032473279065]
We study the trade-off between validity and breadth of language generation algorithms. Existing algorithms for language generation in the limit produce output sets that can have zero density in the true language. We show, however, that we provide an algorithm for language generation in the limit whose outputs have strictly positive density in $K$.
arXiv Detail & Related papers (2025-04-19T18:08:18Z)
On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability [16.30681257128492]
[KM24] is an algorithm for generating from any countable language collection in the limit.<n>Recent work introduces different notions of breadth and explores when generation with breadth is possible.<n>Our results show that generation with many existing notions of breadth becomes equally hard, when stability is required.
arXiv Detail & Related papers (2024-12-24T16:24:43Z)
Exploring Facets of Language Generation in the Limit [10.18252143035175]
We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit. We formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation. Our result shows that a tradeoff between validity and breadth is inherent for generation in the limit.
arXiv Detail & Related papers (2024-11-22T22:13:40Z)
A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z)
Language Generation in the Limit [0.7787343335258782]
We show that there is an agent that is able to generate in the limit for every countable list of candidate languages. This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning.
arXiv Detail & Related papers (2024-04-10T05:53:25Z)
Lexinvariant Language Models [84.2829117441298]
Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM) We study textitlexinvariantlanguage models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice. We show that a lexinvariant LM can attain perplexity comparable to that of a standard language model, given a sufficiently long context.
arXiv Detail & Related papers (2023-05-24T19:10:46Z)
A Measure-Theoretic Characterization of Tight Language Models [105.16477132329416]
In some pathological cases, probability mass can leak'' onto the set of infinite sequences. This paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense.
arXiv Detail & Related papers (2022-12-20T18:17:11Z)
Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z)
Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive. We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language. We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences. We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z)
Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.