Navigating the Grey Area: How Expressions of Uncertainty and
Overconfidence Affect Language Models
- URL: http://arxiv.org/abs/2302.13439v2
- Date: Mon, 13 Nov 2023 18:10:16 GMT
- Title: Navigating the Grey Area: How Expressions of Uncertainty and
Overconfidence Affect Language Models
- Authors: Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto
- Abstract summary: LMs are highly sensitive to markers of certainty in prompts, with accuies varying more than 80%.
We find that expressions of high certainty result in a decrease in accuracy as compared to low expressions; similarly, factive verbs hurt performance, while evidentials benefit performance.
These associations may suggest that LMs is based on observed language use, rather than truly reflecting uncertainty.
- Score: 74.07684768317705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increased deployment of LMs for real-world tasks involving knowledge and
facts makes it important to understand model epistemology: what LMs think they
know, and how their attitudes toward that knowledge are affected by language
use in their inputs. Here, we study an aspect of model epistemology: how
epistemic markers of certainty, uncertainty, or evidentiality like "I'm sure
it's", "I think it's", or "Wikipedia says it's" affect models, and whether they
contribute to model failures. We develop a typology of epistemic markers and
inject 50 markers into prompts for question answering. We find that LMs are
highly sensitive to epistemic markers in prompts, with accuracies varying more
than 80%. Surprisingly, we find that expressions of high certainty result in a
7% decrease in accuracy as compared to low certainty expressions; similarly,
factive verbs hurt performance, while evidentials benefit performance. Our
analysis of a popular pretraining dataset shows that these markers of
uncertainty are associated with answers on question-answering websites, while
markers of certainty are associated with questions. These associations may
suggest that the behavior of LMs is based on mimicking observed language use,
rather than truly reflecting epistemic uncertainty.
Related papers
- Belief in the Machine: Investigating Epistemological Blind Spots of Language Models [51.63547465454027]
Language models (LMs) are essential for reliable decision-making in fields like healthcare, law, and journalism.
This study systematically evaluates the capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE.
Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios.
Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data.
arXiv Detail & Related papers (2024-10-28T16:38:20Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions [46.60244609728416]
Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a litmus test of a model's utility in clinical practice.
We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WDs)
We reveal four surprising results about LMs/LLMs.
arXiv Detail & Related papers (2024-06-17T19:50:40Z) - "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance.
We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy.
Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z) - Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance.
Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate.
This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Explainable Depression Symptom Detection in Social Media [2.677715367737641]
We propose using transformer-based architectures to detect and explain the appearance of depressive symptom markers in the users' writings.
Our natural language explanations enable clinicians to interpret the models' decisions based on validated symptoms.
arXiv Detail & Related papers (2023-10-20T17:05:27Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.