Related papers: What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

URL: http://arxiv.org/abs/2406.12830v3
Date: Mon, 30 Sep 2024 11:15:24 GMT
Title: What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Authors: Akshay Paruchuri, Jake Garrison, Shun Liao, John Hernandez, Jacob Sunshine, Tim Althoff, Xin Liu, Daniel McDuff,
Abstract summary: We focus on evaluating the probabilistic reasoning capabilities of language models (LMs) using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities.
Score: 23.487484744911995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities. We evaluate three ways to provide context to LMs 1) anchoring examples from within a distribution or family of distributions, 2) real-world context, 3) summary statistics on which to base a Normal approximation. Models can make inferences about distributions, and can be further aided by the incorporation of real-world context, example shots and simplified assumptions, even if these assumptions are incorrect or misspecified. To conduct this work, we developed a comprehensive benchmark distribution dataset with associated question-answer pairs that we have released publicly.

Related papers

What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction [16.63148156570219]
We argue that different settings lead to three distinct intended output distributions.<n>We demonstrate that NLP works often assume that these distributions should be similar, which leads to misinterpretations of their experimental findings.
arXiv Detail & Related papers (2025-05-04T11:46:48Z)
Counterfactual Realizability [52.85109506684737]
We introduce a formal definition of realizability, the ability to draw samples from a distribution, and then develop a complete algorithm to determine whether an arbitrary counterfactual distribution is realizable. We illustrate the implications of this new framework for counterfactual data collection using motivating examples from causal fairness and causal reinforcement learning.
arXiv Detail & Related papers (2025-03-14T20:54:27Z)
Towards Understanding Extrapolation: a Causal Lens [53.15488984371969]
We provide a theoretical understanding of when extrapolation is possible and offer principled methods to achieve it. Under this formulation, we cast the extrapolation problem into a latent-variable identification problem. Our theory reveals the intricate interplay between the underlying manifold's smoothness and the shift properties.
arXiv Detail & Related papers (2025-01-15T21:29:29Z)
Benchmarking Distributional Alignment of Large Language Models [43.0198231524816]
Language models (LMs) are increasingly used as simulacra for people, yet their ability to match the distribution of views of a specific demographic group remains uncertain. We construct a dataset expanding beyond political values, create human baselines for this task, and evaluate the extent to which an LM can align with a particular group's opinion distribution. Our analysis reveals open problems regarding if, and how, LMs can be used to simulate humans, and that LLMs can more accurately describe the opinion distribution than simulate such distributions.
arXiv Detail & Related papers (2024-11-08T08:41:17Z)
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs) We derive novel metrics with high-probability guarantees concerning the output distribution of a model. Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z)
Domain Generalization with Small Data [27.040070085669086]
We learn a domain-invariant representation based on the probabilistic framework by mapping each data point into probabilistic embeddings. Our proposed method can marriage the measurement on the textitdistribution over distributions (i.e., the global perspective alignment) and the distribution-based contrastive semantic alignment.
arXiv Detail & Related papers (2024-02-09T02:59:08Z)
Numerically assisted determination of local models in network scenarios [55.2480439325792]
We develop a numerical tool for finding explicit local models that reproduce a given statistical behaviour. We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions. The developed codes and documentation are publicly available at281.com/mariofilho/localmodels.
arXiv Detail & Related papers (2023-03-17T13:24:04Z)
Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language. Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate. We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z)
Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z)
A Brief Introduction to Generative Models [8.031257560764336]
We introduce and motivate generative modeling as a central task for machine learning. We outline the maximum likelihood approach and how it can be interpreted as minimizing KL-divergence. We explore the alternative adversarial approach which involves studying the differences between an estimating distribution and a real data distribution.
arXiv Detail & Related papers (2021-02-27T16:49:41Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
Contextuality scenarios arising from networks of stochastic processes [68.8204255655161]
An empirical model is said contextual if its distributions cannot be obtained marginalizing a joint distribution over X. We present a different and classical source of contextual empirical models: the interaction among many processes. The statistical behavior of the network in the long run makes the empirical model generically contextual and even strongly contextual.
arXiv Detail & Related papers (2020-06-22T16:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.