The Calibration Gap between Model and Human Confidence in Large Language
Models
- URL: http://arxiv.org/abs/2401.13835v1
- Date: Wed, 24 Jan 2024 22:21:04 GMT
- Title: The Calibration Gap between Model and Human Confidence in Large Language
Models
- Authors: Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer
Karny, Xinyue Hu, Lukas Mayer, Padhraic Smyth
- Abstract summary: Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct.
Recent work has focused on the quality of internal LLM confidence assessments.
This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
- Score: 14.539888672603743
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For large language models (LLMs) to be trusted by humans they need to be
well-calibrated in the sense that they can accurately assess and communicate
how likely it is that their predictions are correct. Recent work has focused on
the quality of internal LLM confidence assessments, but the question remains of
how well LLMs can communicate this internal model confidence to human users.
This paper explores the disparity between external human confidence in an LLM's
responses and the internal confidence of the model. Through experiments
involving multiple-choice questions, we systematically examine human users'
ability to discern the reliability of LLM outputs. Our study focuses on two key
areas: (1) assessing users' perception of true LLM confidence and (2)
investigating the impact of tailored explanations on this perception. The
research highlights that default explanations from LLMs often lead to user
overestimation of both the model's confidence and its' accuracy. By modifying
the explanations to more accurately reflect the LLM's internal confidence, we
observe a significant shift in user perception, aligning it more closely with
the model's actual confidence levels. This adjustment in explanatory approach
demonstrates potential for enhancing user trust and accuracy in assessing LLM
outputs. The findings underscore the importance of transparent communication of
confidence levels in LLMs, particularly in high-stakes applications where
understanding the reliability of AI-generated information is essential.
Related papers
- Learning to Route with Confidence Tokens [43.63392143501436]
We study the extent to which large language models can reliably indicate confidence in their answers.
We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in a reliable manner.
Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
arXiv Detail & Related papers (2024-10-17T07:28:18Z) - Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators [6.403926452181712]
Large Language Models (LLMs) tend to be unreliable in the factuality of their answers.
We present a survey and empirical comparison of estimators of factual confidence.
Our experiments indicate that trained hidden-state probes provide the most reliable confidence estimates.
arXiv Detail & Related papers (2024-06-19T10:11:37Z) - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales [29.33581578047835]
SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates.
In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge.
We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
arXiv Detail & Related papers (2024-05-31T16:21:16Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD)
It integrates quality reward and order-preserving alignment reward functions.
Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z) - Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience [41.06726400259579]
Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks.
We propose a method of Learning from Past experience (LePe) to enhance the capability for confidence expression.
arXiv Detail & Related papers (2024-04-16T06:47:49Z) - Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps.
Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%.
FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z) - TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications.
This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs)
We first propose a set of principles for trustworthy LLMs that span eight different dimensions.
Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z) - Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning.
This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation.
We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.