Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
Scores from Language Models Fine-Tuned with Human Feedback
- URL: http://arxiv.org/abs/2305.14975v2
- Date: Tue, 24 Oct 2023 04:27:42 GMT
- Title: Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
Scores from Language Models Fine-Tuned with Human Feedback
- Authors: Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael
Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning
- Abstract summary: A trustworthy real-world prediction system should produce well-calibrated confidence scores.
We show that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.
- Score: 91.22679548111127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A trustworthy real-world prediction system should produce well-calibrated
confidence scores; that is, its confidence in an answer should be indicative of
the likelihood that the answer is correct, enabling deferral to an expert in
cases of low-confidence predictions. Recent studies have shown that
unsupervised pre-training produces large language models (LMs) whose
conditional probabilities are remarkably well-calibrated. However, the most
widely-used LMs are fine-tuned with reinforcement learning from human feedback
(RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional
probabilities that are very poorly calibrated. In light of this perceived
weakness, we conduct a broad evaluation of methods for extracting confidence
scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find
that verbalized confidences emitted as output tokens are typically
better-calibrated than the model's conditional probabilities on the TriviaQA,
SciQ, and TruthfulQA benchmarks, often reducing the expected calibration error
by a relative 50%.
Related papers
- Graph-based Confidence Calibration for Large Language Models [22.394717844099684]
We propose a novel method to develop a well-calibrated confidence estimation model.
We use a weighted graph to represent the consistency among the large language models' responses to a question.
We then train a graph neural network to estimate the probability of correct responses.
arXiv Detail & Related papers (2024-11-03T20:36:44Z) - Calibrated Large Language Models for Binary Question Answering [49.1574468325115]
A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct.
We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels.
arXiv Detail & Related papers (2024-07-01T09:31:03Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models [69.68379406317682]
We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers.
We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.
We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
arXiv Detail & Related papers (2024-05-31T17:16:38Z) - Few-Shot Recalibration of Language Models [23.829795148520834]
We train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice.
Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice.
Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods.
arXiv Detail & Related papers (2024-03-27T06:25:40Z) - Calibrating Large Language Models Using Their Generations Only [44.26441565763495]
APRICOT is a method to set confidence targets and train an additional model that predicts an LLM's confidence based on its textual input and output alone.
It is conceptually simple, does not require access to the target model beyond its output, does not interfere with the language generation, and has a multitude of potential usages.
We show how our approach performs competitively in terms of calibration error for white-box and black-box LLMs on closed-book question-answering to detect incorrect LLM answers.
arXiv Detail & Related papers (2024-03-09T17:46:24Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Llamas Know What GPTs Don't Show: Surrogate Models for Confidence
Estimation [70.27452774899189]
Large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user.
As of November 2023, state-of-the-art LLMs do not provide access to these probabilities.
Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets.
arXiv Detail & Related papers (2023-11-15T11:27:44Z) - Quantifying Uncertainty in Answers from any Language Model and Enhancing
their Trustworthiness [16.35655151252159]
We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model.
Our uncertainty quantification technique works for any LLM accessible only via a black-box API.
arXiv Detail & Related papers (2023-08-30T17:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.