Related papers: Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

URL: http://arxiv.org/abs/2401.06730v2
Date: Tue, 9 Jul 2024 23:53:06 GMT
Title: Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap,
Abstract summary: We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
Score: 53.336235704123915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (an average of 47%) among confident responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty. Our work highlights new safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

Related papers

Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing. We identify several features of LLM responses that shape users' reliance. We find that explanations increase reliance on both correct and incorrect responses. We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established. This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt. We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
Self-Recognition in Language Models [10.649471089216489]
We propose a novel approach for assessing self-recognition in LMs using model-generated "security questions" We use our test to examine self-recognition in ten of the most capable open- and closed-source LMs currently publicly available. Our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin.
arXiv Detail & Related papers (2024-07-09T15:23:28Z)
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? [21.814007454504978]
We show that large language models (LLMs) should be capable of expressing their intrinsic uncertainty in natural language. We formalize faithful response uncertainty based on the gap between the model's intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed.
arXiv Detail & Related papers (2024-05-27T07:56:23Z)
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z)
What Large Language Models Know and What People Think They Know [13.939511057660013]
Large language models (LLMs) are increasingly integrated into decision-making processes. To earn human trust, LLMs must be well calibrated so that they can accurately assess and communicate the likelihood of their predictions being correct. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models' actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers.
arXiv Detail & Related papers (2024-01-24T22:21:04Z)
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis [127.85293480405082]
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs. This study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them.
arXiv Detail & Related papers (2023-10-16T14:59:10Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
LM vs LM: Detecting Factual Errors via Cross Examination [22.50837561382647]
We propose a factuality evaluation framework for language models (LMs) Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates. We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks.
arXiv Detail & Related papers (2023-05-22T17:42:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.