Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
- URL: http://arxiv.org/abs/2503.01332v2
- Date: Tue, 30 Sep 2025 05:18:59 GMT
- Title: Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
- Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee,
- Abstract summary: Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals.<n>We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures.<n>We find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies.
- Score: 63.559461750135334
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals. During this automatic process, agents need to take a series of actions, some of which might lead to severe consequences if incorrect actions are taken. Therefore, such agents must sometimes defer-refusing to act when their confidence is insufficient-to avoid the potential cost of incorrect actions. Because the severity of consequences varies across applications, the tendency to defer should also vary: in low-risk settings agents should answer more freely, while in high-risk settings their decisions should be more conservative. We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures-rewards and penalties for correct answers, incorrect answers, and refusals $(r_{\mathrm{cor}},r_{\mathrm{inc}}, r_{\mathrm{ref}})$-while keeping tasks fixed. This design evaluates LMs' risk-aware decision policies by measuring their ability to maximize expected reward. Across multiple datasets and models, we identify flaws in their decision policies: LMs tend to over-answer in high-risk settings and over-defer in low-risk settings. After analyzing the potential cause of such flaws, we find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies. Our results highlight the current limitations of LMs in risk-conditioned decision making and provide practical guidance for deploying more reliable LM-based agents across applications of varying risk levels.
Related papers
- FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs [28.687949604557986]
We propose FROC, a unified framework with Risk-d Control for machine unlearning in large language models (LLMs)<n>FROC is built around a conformal-style risk-control formulation that expresses a user-specified risk budget on unlearning behavior.<n> Experiments across multiple LLM MU methods demonstrate that FROC produces stable, interpretable risk landscapes.
arXiv Detail & Related papers (2025-12-15T13:53:12Z) - Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z) - Can Risk-taking AI-Assistants suitably represent entities [0.0]
This study investigates the manipulability of risk aversion in language models (LMs)<n>It focuses on gender-specific attitudes, uncertainty, role-based decision-making, and the manipulability of risk aversion.<n>The results suggest directions for refining AI design to better align human and AI risk preferences.
arXiv Detail & Related papers (2025-10-09T11:55:31Z) - Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents [30.378925170216835]
Self-replication risk of Large Language Model (LLM) agents driven by objective misalignment has drawn growing attention.<n>We present a comprehensive evaluation framework for quantifying self-replication risks.
arXiv Detail & Related papers (2025-09-29T17:49:50Z) - LM Agents May Fail to Act on Their Own Risk Knowledge [15.60032437959883]
Language model (LM) agents pose a diverse array of potential, severe risks in safety-critical scenarios.<n>While they often answer "Yes" to queries like "Is executing sudo rm -rf /*' dangerous?", they will likely fail to identify such risks in instantiated trajectories.
arXiv Detail & Related papers (2025-08-19T02:46:08Z) - Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios [1.5367554212163714]
This paper presents a Case-Based Reasoning Augmented Large Language Model (CBR-LLM) framework for evasive maneuver decision-making in complex risk scenarios.<n>Our approach integrates semantic scene understanding from dashcam video inputs with the retrieval of relevant past driving cases.<n>Experiments show that our framework improves decision accuracy, justification quality, and alignment with human expert behavior.
arXiv Detail & Related papers (2025-06-25T15:19:25Z) - Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs [40.7342896954488]
We advocate for the adoption of a framework that provides a coherent foundation to reason about uncertainty and clarify the reducibility of uncertainty.<n>By supporting active resolution rather than passive avoidance, it opens the door to more reliable, transparent, and broadly applicable LLM systems.
arXiv Detail & Related papers (2025-06-09T05:52:03Z) - Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks.<n>In this work, we investigate the interplay between reasoning and safety in LLMs.<n>We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z) - RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs [29.832360523402592]
We introduce RACQUET, a dataset targeting distinct aspects of ambiguity in image-based question answering.
We reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses.
Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.
arXiv Detail & Related papers (2024-12-18T13:25:11Z) - Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents.
We identify six key features of LLM-based agents, based on which we summarize the current research progress.
We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z) - DeFine: Decision-Making with Analogical Reasoning over Factor Profiles [35.9909472797192]
textscDeFine is a modular framework that constructs probabilistic factor profiles from complex scenarios.<n>It then integrates these profiles with analogical reasoning to guide LLMs in making critical decisions in new situations.<n>This approach is particularly useful in areas such as consulting and financial deliberation, where making decisions under uncertainty is vital.
arXiv Detail & Related papers (2024-10-02T17:29:34Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.<n>We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.<n>We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference [3.422309388045878]
Large language models (LLMs) such as ChatGPT are known to pose important risks.
misplaced confidence arises from over-confidence or under-confidence, that the models have in their inference.
We propose an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks.
arXiv Detail & Related papers (2024-08-04T05:24:32Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs)
We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro.
Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z) - ABI Approach: Automatic Bias Identification in Decision-Making Under Risk based in an Ontology of Behavioral Economics [46.57327530703435]
Risk seeking preferences for losses, driven by biases such as loss aversion, pose challenges and can result in severe negative consequences.
This research introduces the ABI approach, a novel solution designed to support organizational decision-makers by automatically identifying and explaining risk seeking preferences.
arXiv Detail & Related papers (2024-05-22T23:53:46Z) - DeLLMa: Decision Making Under Uncertainty with Large Language Models [31.77731889916652]
DeLLMa is a framework designed to enhance decision-making accuracy in uncertain environments.
We show that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods.
arXiv Detail & Related papers (2024-02-04T08:11:45Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles.
We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z) - Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
Analysis [127.85293480405082]
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.
Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs.
This study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them.
arXiv Detail & Related papers (2023-10-16T14:59:10Z) - On solving decision and risk management problems subject to uncertainty [91.3755431537592]
Uncertainty is a pervasive challenge in decision and risk management.
This paper develops a systematic understanding of such strategies, determine their range of application, and develop a framework to better employ them.
arXiv Detail & Related papers (2023-01-18T19:16:23Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Enhancing Covid-19 Decision-Making by Creating an Assurance Case for
Simulation Models [7.241250079741012]
We argue that any COVID-19 simulation model that is used to guide critical policy decisions would benefit from being supported with an assurance case.
This would enable a critical review of the implicit assumptions and inherent uncertainty in modelling, and would give the overall decision-making process greater transparency and accountability.
arXiv Detail & Related papers (2020-05-17T22:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.