Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
- URL: http://arxiv.org/abs/2506.00582v2
- Date: Mon, 28 Jul 2025 12:59:13 GMT
- Title: Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
- Authors: Chenjun Xu, Bingbing Wen, Bin Han, Robert Wolfe, Lucy Lu Wang, Bill Howe,
- Abstract summary: We show that models exhibit subtle differences from human patterns of overconfidence when prompted to answer based on different personas.<n>We propose Answer-Free Confidence Estimation to improve confidence calibration and LLM interpretability.
- Score: 16.635844645949636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.
Related papers
- Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment [27.57574817687014]
Large language models (LLMs) can simulate specific personality traits and produce behaviors that align with those traits.<n>Few studies have examined how simulated personalities impact confidence calibration, specifically the tendencies toward overconfidence or underconfidence.<n>We show that personalities such as low agreeableness consistently align more closely with human labels than the unprompted condition.
arXiv Detail & Related papers (2026-01-05T07:46:29Z) - BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents [58.05949210993854]
We investigate whether search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions.<n>We propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level.
arXiv Detail & Related papers (2025-10-27T15:58:51Z) - How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models [28.62988505317048]
Large language models (LLMs) exhibit strikingly conflicting behaviors.<n>LLMs can appear steadfastly overconfident in their initial answers whilst being prone to excessive doubt when challenged.<n>We show that LLMs exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer.
arXiv Detail & Related papers (2025-07-03T18:57:43Z) - Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences [62.52739672949452]
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary.<n>We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence.<n>Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores.
arXiv Detail & Related papers (2025-02-03T07:43:27Z) - Understanding the Dark Side of LLMs' Intrinsic Self-Correction [58.12627172032851]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability.<n>Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts.<n>We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z) - Confidence in the Reasoning of Large Language Models [0.0]
Confidence is measured in terms of persistence in keeping their answer when prompted to reconsider.<n> Confidence is only partially explained by the underlying token-level probability.
arXiv Detail & Related papers (2024-12-19T10:04:29Z) - Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level.
We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z) - Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability.
Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety.
In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z) - Reconfidencing LLMs from the Grouping Loss Perspective [56.801251926946485]
Large Language Models (LLMs) are susceptible to generating hallucinated answers in a confident tone.
Recent findings show that controlling uncertainty must go beyond calibration.
We construct a new evaluation dataset derived from a knowledge base to assess confidence scores given to answers of Mistral and LLaMA.
arXiv Detail & Related papers (2024-02-07T15:40:22Z) - What Large Language Models Know and What People Think They Know [13.939511057660013]
Large language models (LLMs) are increasingly integrated into decision-making processes.<n>To earn human trust, LLMs must be well calibrated so that they can accurately assess and communicate the likelihood of their predictions being correct.<n>Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models' actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - A Survey of Confidence Estimation and Calibration in Large Language Models [86.692994151323]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains.
Despite their impressive performance, they can be unreliable due to factual errors in their generations.
Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.
arXiv Detail & Related papers (2023-11-14T16:43:29Z) - On the Intersection of Self-Correction and Trust in Language Models [7.8833421052793256]
Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex cognitive tasks.
Recent research has explored the self-correction capabilities of LLMs to enhance their performance.
We conduct experiments focusing on two key aspects of trustworthiness: truthfulness and toxicity.
arXiv Detail & Related papers (2023-11-06T00:04:12Z) - The Confidence-Competence Gap in Large Language Models: A Cognitive
Study [3.757390057317548]
Large Language Models (LLMs) have acquired ubiquitous attention for their performances across diverse domains.
We exploit these models with diverse sets of questionnaires and real-world scenarios.
Our findings reveal intriguing instances where models demonstrate high confidence even when they answer incorrectly.
arXiv Detail & Related papers (2023-09-28T03:50:09Z) - Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning.
This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation.
We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.