How to Compute the Probability of a Word
- URL: http://arxiv.org/abs/2406.14561v1
- Date: Thu, 20 Jun 2024 17:59:42 GMT
- Title: How to Compute the Probability of a Word
- Authors: Tiago Pimentel, Clara Meister,
- Abstract summary: This paper derives the correct methods for computing word probabilities.
We show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.
- Score: 45.23856093235994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) estimate the probability distribution over sequences of natural language; these distributions are crucial for computing perplexity and surprisal in linguistics research. While we are usually concerned with measuring these values for words, most LMs operate over subwords. Despite seemingly straightforward, accurately computing probabilities over one unit given probabilities over the other requires care. Indeed, we show here that many recent linguistic studies have been incorrectly computing these values. This paper derives the correct methods for computing word probabilities, highlighting issues when relying on language models that use beginning-of-word (bow)-marking tokenisers, e.g., the GPT family. Empirically, we show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.
Related papers
- Evaluating language models as risk scores [23.779329697527054]
We focus on the use of language models as risk scores for unrealizable prediction tasks.
We introduce folktexts, a software package to systematically generate risk scores using large language models.
We demonstrate the utility of folktexts through a sweep of empirical insights on 16 recent large language models.
arXiv Detail & Related papers (2024-07-19T18:13:37Z) - Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities [15.073507986272027]
We argue that there is a confound posed by the subword tokenization scheme of language models.
We present a simple decoding technique to reaccount the probability of the trailing whitespace into that of the current word.
arXiv Detail & Related papers (2024-06-16T08:44:56Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Prompting is not a substitute for probability measurements in large
language models [22.790531588072245]
We compare metalinguistic prompting and direct probability measurements as ways of measuring models' linguistic knowledge.
Our findings suggest that negative results relying on metalinguistic prompts cannot be taken as conclusive evidence that an LLM lacks a particular linguistic generalization.
Our results also highlight the value that is lost with the move to closed APIs where access to probability distributions is limited.
arXiv Detail & Related papers (2023-05-22T17:33:17Z) - Token-wise Decomposition of Autoregressive Language Model Hidden States
for Analyzing Model Predictions [9.909170013118775]
This work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token.
Using the change in next-word probability as a measure of importance, this work first examines which context words make the biggest contribution to language model predictions.
arXiv Detail & Related papers (2023-05-17T23:55:32Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.