Related papers: BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

URL: http://arxiv.org/abs/2404.12494v2
Date: Wed, 16 Oct 2024 17:45:10 GMT
Title: BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Authors: Yu Feng, Ben Zhou, Weidong Lin, Dan Roth,
Abstract summary: Predictive models often need to work with incomplete information in real-world tasks. Current large language models (LLM) are insufficient for such accurate estimations. We propose BIRD, a novel probabilistic inference framework.
Score: 52.46248487458641
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predictive models often need to work with incomplete information in real-world tasks. Consequently, they must provide reliable probability or confidence estimation, especially in large-scale decision making and planning tasks. Current large language models (LLM) are insufficient for such accurate estimations, but they can generate relevant factors that may affect the probabilities, produce coarse-grained probabilities when the information is more complete, and help determine which factors are relevant to specific downstream contexts. In this paper, we make use of these capabilities of LLMs to provide a significantly more accurate probabilistic estimation. We propose BIRD, a novel probabilistic inference framework that aligns a Bayesian network with LLM abductions and then estimates more accurate probabilities in a deduction step. We show BIRD provides reliable probability estimations that are 30\% better than those provided directly by LLM baselines. These estimates can further contribute to better and more trustworthy decision-making.

Related papers

Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization [22.286144400569007]
Large Language Models (LLMs) have demonstrated potential as factual knowledge bases.<n>This paper investigates using probabilistic knowledge in LLMs to derive probability estimates for statements concerning events.
arXiv Detail & Related papers (2025-05-21T18:15:05Z)
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation [37.950889606305836]
We present a state-of-the-art model for fine-grained probability estimation of propositions conditioned on context.<n>We show that our approach consistently outperforms existing fine-tuned and prompting-based methods by a large margin.
arXiv Detail & Related papers (2025-05-02T21:33:18Z)
Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs [12.489784979345654]
We show that current versions of large language models (LLMs) lack the ability to provide rational and coherent representations of probabilistic beliefs. We apply well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.
arXiv Detail & Related papers (2025-04-18T11:50:30Z)
Probabilistic Reasoning with LLMs for k-anonymity Estimation [23.16673184539629]
We introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value. Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning.
arXiv Detail & Related papers (2025-03-12T17:41:25Z)
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs) We derive novel metrics with high-probability guarantees concerning the output distribution of a model. Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z)
Probabilistic Medical Predictions of Large Language Models [4.825666689707888]
Large Language Models (LLMs) have demonstrated significant potential in clinical applications through prompt engineering. LLMs' limitations in numerical reasoning raise concerns about the reliability of these text-generated probabilities. Experimenting with six advanced open-source LLMs across five medical datasets, we found that the performance of explicit probabilities was consistently lower than implicit probabilities.
arXiv Detail & Related papers (2024-08-21T03:47:17Z)
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. Our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z)
Quantifying Prediction Consistency Under Model Multiplicity in Tabular LLMs [10.494477811252034]
Fine-tuning large language models can lead to textitfine-tuning multiplicity, where equally well-performing models make conflicting predictions on the same inputs. This raises critical concerns about the robustness and reliability of Tabular LLMs. This work proposes a novel metric to quantify the robustness of individual predictions without expensive model retraining.
arXiv Detail & Related papers (2024-07-04T22:22:09Z)
Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits [18.740781076082044]
We propose an approach to overcome the independence assumption behind most of the approaches dealing with a large class of probabilistic reasoning. We provide an algorithm for Bayesian learning from sparse, albeit complete, observations. Each leaf of such circuits is labelled with a beta-distributed random variable that provides us with an elegant framework for representing uncertain probabilities.
arXiv Detail & Related papers (2021-02-22T10:03:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.