Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
- URL: http://arxiv.org/abs/2504.13644v1
- Date: Fri, 18 Apr 2025 11:50:30 GMT
- Title: Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
- Authors: Gabriel Freedman, Francesca Toni,
- Abstract summary: We show that current versions of large language models (LLMs) lack the ability to provide rational and coherent representations of probabilistic beliefs.<n>We apply well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.
- Score: 12.489784979345654
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Advances in the general capabilities of large language models (LLMs) have led to their use for information retrieval, and as components in automated decision systems. A faithful representation of probabilistic reasoning in these models may be essential to ensure trustworthy, explainable and effective performance in these tasks. Despite previous work suggesting that LLMs can perform complex reasoning and well-calibrated uncertainty quantification, we find that current versions of this class of model lack the ability to provide rational and coherent representations of probabilistic beliefs. To demonstrate this, we introduce a novel dataset of claims with indeterminate truth values and apply a number of well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.
Related papers
- Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? [0.0]
This study investigates the potential of Large Language Models (LLMs) as an alternative to human expert elicitation for extracting structured causal knowledge.<n>LLMs generated causal structures, specifically Bayesian networks (BNs), were benchmarked against traditional statistical methods.<n>LLMs generated BNs demonstrated lower entropy than expert-elicited and statistically generated BNs, suggesting higher confidence and precision in predictions.
arXiv Detail & Related papers (2025-04-14T16:45:52Z) - FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models [59.171510592986735]
We propose FactReasoner, a new factuality assessor that relies on probabilistic reasoning to assess the factuality of a long-form generated response.
Our experiments on labeled and unlabeled benchmark datasets demonstrate clearly that FactReasoner improves considerably over state-of-the-art prompt-based approaches.
arXiv Detail & Related papers (2025-02-25T19:01:48Z) - An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)
This paper explores potential areas where statisticians can make important contributions to the development of LLMs.
We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)<n> Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.<n>Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Verbalized Probabilistic Graphical Modeling [8.524824578426962]
We propose Verbalized Probabilistic Graphical Modeling (vPGM) to simulate key principles of Probabilistic Graphical Models (PGMs) in natural language.<n> vPGM bypasses expert-driven model design, making it well-suited for scenarios with limited assumptions or scarce data.<n>Our results indicate that the model effectively enhances confidence calibration and text generation quality.
arXiv Detail & Related papers (2024-06-08T16:35:31Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language [35.84181171987974]
Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations.
We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from Large Language Models.
We demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions.
arXiv Detail & Related papers (2024-05-21T15:13:12Z) - BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models [52.46248487458641]
Predictive models often need to work with incomplete information in real-world tasks.<n>Current large language models (LLMs) are insufficient for accurate estimations.<n>We propose BIRD, a novel probabilistic inference framework.
arXiv Detail & Related papers (2024-04-18T20:17:23Z) - Reasoning over Uncertain Text by Generative Large Language Models [18.983753573277596]
This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values.<n>We introduce the Bayesian Linguistic Inference dataset (BLInD), a new dataset designed to test the probabilistic reasoning capabilities of LLMs.<n>We present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming.
arXiv Detail & Related papers (2024-02-14T23:05:44Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Model-free generalized fiducial inference [0.0]
I propose and develop ideas for a model-free statistical framework for imprecise probabilistic prediction inference.
This framework facilitates uncertainty quantification in the form of prediction sets that offer finite sample control of type 1 errors.
I consider the theoretical and empirical properties of a precise probabilistic approximation to the model-free imprecise framework.
arXiv Detail & Related papers (2023-07-24T01:58:48Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.