Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
- URL: http://arxiv.org/abs/2503.01332v1
- Date: Mon, 03 Mar 2025 09:16:26 GMT
- Title: Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
- Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee,
- Abstract summary: We formalize the task of risk-aware decision-making, expose critical weaknesses in existing LMs, and propose skill-decomposition solutions.<n>Our findings show that even cutting-edge LMs--both regular and reasoning models--still require explicit prompt chaining to handle the task effectively.
- Score: 63.54557575233165
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Knowing when to answer or refuse is crucial for safe and reliable decision-making language agents. Although prior work has introduced refusal strategies to boost LMs' reliability, how these models adapt their decisions to different risk levels remains underexplored. We formalize the task of risk-aware decision-making, expose critical weaknesses in existing LMs, and propose skill-decomposition solutions to mitigate them. Our findings show that even cutting-edge LMs--both regular and reasoning models--still require explicit prompt chaining to handle the task effectively, revealing the challenges that must be overcome to achieve truly autonomous decision-making agents.
Related papers
- Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios [1.5367554212163714]
This paper presents a Case-Based Reasoning Augmented Large Language Model (CBR-LLM) framework for evasive maneuver decision-making in complex risk scenarios.<n>Our approach integrates semantic scene understanding from dashcam video inputs with the retrieval of relevant past driving cases.<n>Experiments show that our framework improves decision accuracy, justification quality, and alignment with human expert behavior.
arXiv Detail & Related papers (2025-06-25T15:19:25Z) - Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs [40.7342896954488]
We advocate for the adoption of a framework that provides a coherent foundation to reason about uncertainty and clarify the reducibility of uncertainty.<n>By supporting active resolution rather than passive avoidance, it opens the door to more reliable, transparent, and broadly applicable LLM systems.
arXiv Detail & Related papers (2025-06-09T05:52:03Z) - Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks.<n>In this work, we investigate the interplay between reasoning and safety in LLMs.<n>We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z) - RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs [29.832360523402592]
We introduce RACQUET, a dataset targeting distinct aspects of ambiguity in image-based question answering.
We reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses.
Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.
arXiv Detail & Related papers (2024-12-18T13:25:11Z) - Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents.
We identify six key features of LLM-based agents, based on which we summarize the current research progress.
We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z) - DeFine: Decision-Making with Analogical Reasoning over Factor Profiles [35.9909472797192]
textscDeFine is a modular framework that constructs probabilistic factor profiles from complex scenarios.<n>It then integrates these profiles with analogical reasoning to guide LLMs in making critical decisions in new situations.<n>This approach is particularly useful in areas such as consulting and financial deliberation, where making decisions under uncertainty is vital.
arXiv Detail & Related papers (2024-10-02T17:29:34Z) - Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference [3.422309388045878]
Large language models (LLMs) such as ChatGPT are known to pose important risks.
misplaced confidence arises from over-confidence or under-confidence, that the models have in their inference.
We propose an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks.
arXiv Detail & Related papers (2024-08-04T05:24:32Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - ABI Approach: Automatic Bias Identification in Decision-Making Under Risk based in an Ontology of Behavioral Economics [46.57327530703435]
Risk seeking preferences for losses, driven by biases such as loss aversion, pose challenges and can result in severe negative consequences.
This research introduces the ABI approach, a novel solution designed to support organizational decision-makers by automatically identifying and explaining risk seeking preferences.
arXiv Detail & Related papers (2024-05-22T23:53:46Z) - DeLLMa: Decision Making Under Uncertainty with Large Language Models [31.77731889916652]
DeLLMa is a framework designed to enhance decision-making accuracy in uncertain environments.
We show that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods.
arXiv Detail & Related papers (2024-02-04T08:11:45Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z) - Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
Analysis [127.85293480405082]
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.
Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs.
This study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them.
arXiv Detail & Related papers (2023-10-16T14:59:10Z) - On solving decision and risk management problems subject to uncertainty [91.3755431537592]
Uncertainty is a pervasive challenge in decision and risk management.
This paper develops a systematic understanding of such strategies, determine their range of application, and develop a framework to better employ them.
arXiv Detail & Related papers (2023-01-18T19:16:23Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Enhancing Covid-19 Decision-Making by Creating an Assurance Case for
Simulation Models [7.241250079741012]
We argue that any COVID-19 simulation model that is used to guide critical policy decisions would benefit from being supported with an assurance case.
This would enable a critical review of the implicit assumptions and inherent uncertainty in modelling, and would give the overall decision-making process greater transparency and accountability.
arXiv Detail & Related papers (2020-05-17T22:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.