A Measure-Theoretic Characterization of Tight Language Models
- URL: http://arxiv.org/abs/2212.10502v2
- Date: Mon, 21 Aug 2023 18:01:57 GMT
- Title: A Measure-Theoretic Characterization of Tight Language Models
- Authors: Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason
Eisner, Ryan Cotterell
- Abstract summary: In some pathological cases, probability mass can leak'' onto the set of infinite sequences.
This paper offers a measure-theoretic treatment of language modeling.
We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense.
- Score: 105.16477132329416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language modeling, a central task in natural language processing, involves
estimating a probability distribution over strings. In most cases, the
estimated distribution sums to 1 over all finite strings. However, in some
pathological cases, probability mass can ``leak'' onto the set of infinite
sequences. In order to characterize the notion of leakage more precisely, this
paper offers a measure-theoretic treatment of language modeling. We prove that
many popular language model families are in fact tight, meaning that they will
not leak in this sense. We also generalize characterizations of tightness
proposed in previous works.
Related papers
- Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers [14.797001158310092]
We argue that distributional semantics models struggle with truth-conditional reasoning and symbolic processing.
Contrary to expectations, we find that LLMs align more closely with human judgements on exact quantifiers versus vague ones.
arXiv Detail & Related papers (2024-10-17T19:28:35Z) - Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models [0.0]
We show that the perplexity of any large text produced by a language model must converge to the average entropy of its token distributions.
This work has possible practical applications for understanding and improving AI detection" tools.
arXiv Detail & Related papers (2024-05-22T16:23:40Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task.
We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.