Related papers: Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning

Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning

URL: http://arxiv.org/abs/2402.00251v1
Date: Thu, 1 Feb 2024 00:23:31 GMT
Title: Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning
Authors: Yao-Hung Hubert Tsai, Walter Talbott, Jian Zhang
Abstract summary: This paper focuses on decision planning with uncertainty estimation to address the problem in language models. Our uncertainty estimation and decision-making agent design offer a cost-efficient approach for AI agent development.
Score: 17.752461521448236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Step-by-step decision planning with large language models (LLMs) is gaining attention in AI agent development. This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in language models. Existing approaches are either white-box or computationally demanding, limiting use of black-box proprietary LLMs within budgets. The paper's first contribution is a non-parametric uncertainty quantification method for LLMs, efficiently estimating point-wise dependencies between input-decision on the fly with a single inference, without access to token logits. This estimator informs the statistical interpretation of decision trustworthiness. The second contribution outlines a systematic design for a decision-making agent, generating actions like ``turn on the bathroom light'' based on user prompts such as ``take a bath''. Users will be asked to provide preferences when more than one action has high estimated point-wise dependencies. In conclusion, our uncertainty estimation and decision-making agent design offer a cost-efficient approach for AI agent development.

Related papers

UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making [47.64013151246807]
Large Language Models (LLMs) are integrated into safety-critical applications involving sequential decision-making.<n>Existing LLM Uncertainty Quantification (UQ) methods are primarily designed for single-turn question-answering formats.<n>We introduce a principled, information-theoretic framework that decomposes LLM sequential decision uncertainty into two parts.
arXiv Detail & Related papers (2025-06-20T18:34:04Z)
DecisionFlow: Advancing Large Language Model as Principled Decision Maker [48.654276010223384]
DecisionFlow is a novel decision modeling framework that guides models to reason over structured representations of actions, attributes, and constraints.<n>Rather than predicting answers directly from prompts, DecisionFlow builds a semantically grounded decision space and infers a latent utility function.<n> Empirical results show that DecisionFlow achieves up to 30% accuracy gains over strong prompting baselines.
arXiv Detail & Related papers (2025-05-27T16:23:53Z)
Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z)
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey [11.737403011836532]
Large Language Models (LLMs) excel in text generation, reasoning, and decision-making in high-stakes domains such as healthcare, law, and transportation. Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction. We introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions.
arXiv Detail & Related papers (2025-03-20T05:04:29Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly. L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors. We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z)
Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z)
Rational Decision-Making Agent with Internalized Utility Judgment [91.80700126895927]
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications. This paper proposes RadAgent, which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks.
arXiv Detail & Related papers (2023-08-24T03:11:45Z)
A Meta-heuristic Approach to Estimate and Explain Classifier Uncertainty [0.4264192013842096]
This work proposes a set of class-independent meta-heuristics that can characterize the complexity of an instance in terms of factors are mutually relevant to both human and machine learning decision-making. The proposed measures and framework hold promise for improving model development for more complex instances, as well as providing a new means of model abstention and explanation.
arXiv Detail & Related papers (2023-04-20T13:09:28Z)
Double Fuzzy Probabilistic Interval Linguistic Term Set and a Dynamic Fuzzy Decision Making Model based on Markov Process with tts Application in Multiple Criteria Group Decision Making [0.0]
Probable linguistic term has been proposed to deal with probability distributions in provided linguistic evaluations. Weight information plays a significant role in dynamic information fusion and decision making process. I propose the concept of double fuzzy probability interval linguistic term set (DFPILTS)
arXiv Detail & Related papers (2021-11-30T10:17:08Z)
Ensemble Quantile Networks: Uncertainty-Aware Reinforcement Learning with Applications in Autonomous Driving [1.6758573326215689]
Reinforcement learning can be used to create a decision-making agent for autonomous driving. Previous approaches provide only black-box solutions, which do not offer information on how confident the agent is about its decisions. This paper introduces the Ensemble Quantile Networks (EQN) method, which combines distributional RL with an ensemble approach to obtain a complete uncertainty estimate.
arXiv Detail & Related papers (2021-05-21T10:36:16Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty [66.17147341354577]
We argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. We describe how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness.
arXiv Detail & Related papers (2020-11-15T17:26:14Z)
Value of Information Analysis via Active Learning and Knowledge Sharing in Error-Controlled Adaptive Kriging [7.148732567427574]
This paper proposes the first surrogate-based framework for value of information (VoI) analysis. It affords sharing equality-type information from observations among surrogate models to update likelihoods of multiple events of interest. The proposed VoI analysis framework is applied for an optimal decision-making problem involving load testing of a truss bridge.
arXiv Detail & Related papers (2020-02-06T16:58:27Z)
Dirichlet uncertainty wrappers for actionable algorithm accuracy accountability and auditability [0.5156484100374058]
We propose a wrapper that enriches its output prediction with a measure of uncertainty. Based on the resulting uncertainty measure, we advocate for a rejection system that selects the more confident predictions. Results demonstrate the effectiveness of the uncertainty computed by the wrapper.
arXiv Detail & Related papers (2019-12-29T11:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.