Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models
- URL: http://arxiv.org/abs/2405.16282v5
- Date: Sat, 15 Jun 2024 22:18:06 GMT
- Title: Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models
- Authors: Abhishek Kumar, Robert Morabito, Sanzhar Umbet, Jad Kabbara, Ali Emami,
- Abstract summary: We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
- Score: 14.5291643644017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence conveyed in the model's response when explicitly asked about its certainty. Using various datasets and prompting techniques that encourage model introspection, we probe the alignment between models' internal and expressed confidence. These techniques encompass using structured evaluation scales to rate confidence, including answer options when prompting, and eliciting the model's confidence level for outputs it does not recognize as its own. Notably, among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment, with an average Spearman's $\hat{\rho}$ of 0.42, across a wide range of tasks. Our work contributes to the ongoing efforts to facilitate risk assessment in the application of LLMs and to further our understanding of model trustworthiness.
Related papers
- Confidence Estimation for LLM-Based Dialogue State Tracking [9.305763502526833]
Estimation of a model's confidence on its outputs is critical for Conversational AI systems based on large language models (LLMs)
We provide an exhaustive exploration of methods, including approaches proposed for open- and closed-weight LLMs.
Our findings suggest that fine-tuning open-weight LLMs can result in enhanced AUC performance, indicating better confidence score calibration.
arXiv Detail & Related papers (2024-09-15T06:44:26Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Large Language Model Confidence Estimation via Black-Box Access [30.490207799344333]
We propose a simple framework where, we engineer novel features and train a (interpretable) model to estimate the confidence.
We empirically demonstrate that our framework is effective in estimating confidence of Flan-ul2,-13b and Mistral-7b on four benchmark Q&A tasks.
Our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery.
arXiv Detail & Related papers (2024-06-01T02:08:44Z) - When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD)
It integrates quality reward and order-preserving alignment reward functions.
Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z) - The Calibration Gap between Model and Human Confidence in Large Language
Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct.
Recent work has focused on the quality of internal LLM confidence assessments.
This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - Llamas Know What GPTs Don't Show: Surrogate Models for Confidence
Estimation [70.27452774899189]
Large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user.
As of November 2023, state-of-the-art LLMs do not provide access to these probabilities.
Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets.
arXiv Detail & Related papers (2023-11-15T11:27:44Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z) - Improving the Reliability for Confidence Estimation [16.952133489480776]
Confidence estimation is a task that aims to evaluate the trustworthiness of the model's prediction output during deployment.
Previous works have outlined two important qualities that a reliable confidence estimation model should possess.
We propose a meta-learning framework that can simultaneously improve upon both qualities in a confidence estimation model.
arXiv Detail & Related papers (2022-10-13T06:34:23Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.