Incoherent Probability Judgments in Large Language Models
- URL: http://arxiv.org/abs/2401.16646v1
- Date: Tue, 30 Jan 2024 00:40:49 GMT
- Title: Incoherent Probability Judgments in Large Language Models
- Authors: Jian-Qiao Zhu and Thomas L. Griffiths
- Abstract summary: We assess the coherence of probability judgments made by autoregressive Large Language Models (LLMs)
Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory.
- Score: 5.088721610298991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoregressive Large Language Models (LLMs) trained for next-word prediction
have demonstrated remarkable proficiency at producing coherent text. But are
they equally adept at forming coherent probability judgments? We use
probabilistic identities and repeated judgments to assess the coherence of
probability judgments made by LLMs. Our results show that the judgments
produced by these models are often incoherent, displaying human-like systematic
deviations from the rules of probability theory. Moreover, when prompted to
judge the same event, the mean-variance relationship of probability judgments
produced by LLMs shows an inverted-U-shaped like that seen in humans. We
propose that these deviations from rationality can be explained by linking
autoregressive LLMs to implicit Bayesian inference and drawing parallels with
the Bayesian Sampler model of human probability judgments.
Related papers
- CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models [16.436592723426305]
It is unclear whether language models produce the same value for different ways of assigning joint probabilities to word spans.
Our work introduces a novel framework, ConTestS, involving statistical tests to assess score consistency across interchangeable completion and conditioning orders.
arXiv Detail & Related papers (2024-09-30T06:24:43Z) - Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - Calibrated Large Language Models for Binary Question Answering [49.1574468325115]
A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct.
We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels.
arXiv Detail & Related papers (2024-07-01T09:31:03Z) - Generative vs. Discriminative modeling under the lens of uncertainty quantification [0.929965561686354]
In this paper, we undertake a comparative analysis of generative and discriminative approaches.
We compare the ability of both approaches to leverage information from various sources in an uncertainty aware inference.
We propose a general sampling scheme enabling supervised learning for both approaches, as well as semi-supervised learning when compatible with the considered modeling approach.
arXiv Detail & Related papers (2024-06-13T14:32:43Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans.
This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions.
Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z) - Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment [36.82878715850013]
Merrill et al. argue that, in theory, sentence co-occurrence probabilities predicted by an optimal LM should reflect the entailment relationship of the constituent sentences.
We investigate whether their theory can be used to decode entailment relations from neural LMs.
We find that a test similar to theirs can decode entailment relations between natural sentences, well above random chance, though not perfectly.
arXiv Detail & Related papers (2024-02-21T17:36:07Z) - Invariant Probabilistic Prediction [45.90606906307022]
We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions.
We propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters.
arXiv Detail & Related papers (2023-09-18T18:50:24Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Score Matched Conditional Exponential Families for Likelihood-Free
Inference [0.0]
Likelihood-Free Inference (LFI) relies on simulations from the model.
We generate parameter-simulation pairs from the model independently on the observation.
We use Neural Networks whose weights are tuned with Score Matching to learn a conditional exponential family likelihood approximation.
arXiv Detail & Related papers (2020-12-20T11:57:30Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.