Related papers: How to Compute the Probability of a Word

How to Compute the Probability of a Word

URL: http://arxiv.org/abs/2406.14561v2
Date: Sat, 12 Oct 2024 16:04:53 GMT
Title: How to Compute the Probability of a Word
Authors: Tiago Pimentel, Clara Meister,
Abstract summary: This paper derives the correct methods for computing word probabilities. We show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.
Score: 45.23856093235994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LMs) estimate a probability distribution over strings in a natural language; these distributions are crucial for computing perplexity and surprisal in linguistics research. While we are usually concerned with measuring these values for words, most LMs operate over subwords. Despite seemingly straightforward, accurately computing probabilities over one unit given probabilities over the other requires care. Indeed, we show here that many recent linguistic studies have been incorrectly computing these values. This paper derives the correct methods for computing word probabilities, highlighting issues when relying on language models that use beginning-of-word (bow)-marking tokenisers, e.g., the GPT family. Empirically, we show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.

Related papers

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
We propose a family of three new decoding methods by leveraging a mathematical analysis of the token probability distribution. Our approach consistently performs at least as well as current alternatives in terms of quality and diversity.
arXiv Detail & Related papers (2025-02-19T19:00:02Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
Statistical Uncertainty in Word Embeddings: GloVe-V [35.04183792123882]
We introduce a method to obtain approximate, easy-to-use, and scalable reconstruction error variance estimates for GloVe. To demonstrate the value of embeddings with variance (GloVe-V), we illustrate how our approach enables principled hypothesis testing in core word embedding tasks.
arXiv Detail & Related papers (2024-06-18T00:35:02Z)
Leading Whitespaces of Language Models' Subword Vocabulary Pose a Confound for Calculating Word Probabilities [15.073507986272027]
We argue that there is a confound posed by the most common method of aggregating subword probabilities into word probabilities. This is due to the fact that tokens in the subword vocabulary of most language models have leading whitespaces. We present a simple decoding technique to reaccount the probability of the trailing whitespace into that of the current word.
arXiv Detail & Related papers (2024-06-16T08:44:56Z)
Probabilistic Method of Measuring Linguistic Productivity [0.0]
I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words. token frequency does not dominate the productivity measure but naturally influences the sampling of bases. A corpus-based approach and randomised design assure that true neologisms and words coined long ago have equal chances to be selected.
arXiv Detail & Related papers (2023-08-24T08:36:28Z)
Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs) Standard conformal prediction produces prediction sets with rigorous, statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z)
Prompting is not a substitute for probability measurements in large language models [22.790531588072245]
We compare metalinguistic prompting and direct probability measurements as ways of measuring models' linguistic knowledge. Our findings suggest that negative results relying on metalinguistic prompts cannot be taken as conclusive evidence that an LLM lacks a particular linguistic generalization. Our results also highlight the value that is lost with the move to closed APIs where access to probability distributions is limited.
arXiv Detail & Related papers (2023-05-22T17:33:17Z)
Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language. Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate. We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z)
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.