Related papers: LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

URL: http://arxiv.org/abs/2405.21028v2
Date: Wed, 3 Jul 2024 12:49:23 GMT
Title: LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Authors: Elias Stengel-Eskin, Peter Hase, Mohit Bansal,
Abstract summary: We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers. We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
Score: 69.68379406317682
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (e.g. giving a numeric score) as well as implicit markers, like an authoritative tone or elaborating with additional knowledge. For LLMs to be trustworthy knowledge sources, the confidence they convey should match their actual expertise; however, most current models tend towards overconfidence. To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We cast calibration as preference optimization, creating data via a two-agent game, where a speaker model's outputs are judged by a simulated listener. We then finetune three LLMs (Mistral-7B, Llama3-8B, Llama3-70B) with LACIE, and show that the resulting models are better calibrated w.r.t. a simulated listener. Crucially, these trends transfer to human listeners, helping them correctly predict model correctness: we conduct a human evaluation where annotators accept or reject an LLM's answers, finding that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers. Furthermore, LACIE generalizes to another dataset, resulting in a large increase in truthfulness on TruthfulQA when trained on TriviaQA. Our analysis indicates that LACIE leads to a better confidence separation between correct and incorrect examples. Qualitatively, we find that a LACIE-trained model hedges more and implicitly signals certainty when it is correct by using an authoritative tone or including details. Finally, LACIE finetuning leads to an emergent increase in model abstention (e.g. saying "I don't know") for answers that are likely wrong.

Related papers

Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence [16.311538811237536]
Large language models (LLMs) are increasingly used for factual question-answering. For these verbalized expressions of uncertainty to be meaningful, they should reflect the error rates at the expressed level of confidence. Many prior methods calculate lexical uncertainty, estimating a model's confidence in the specific string it generated.
arXiv Detail & Related papers (2025-03-18T21:29:29Z)
Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs' Memory [15.986679553468989]
Large language models (LLMs) have shown promise as potential knowledge bases. LLMs often struggle with question-answering tasks and are prone to hallucinations. We develop SkipUnsure, a method to improve answer accuracy by leveraging detected but unexpressed knowledge.
arXiv Detail & Related papers (2024-12-30T10:29:18Z)
On Verbalized Confidence Scores for LLMs [25.160810008907397]
Uncertainty quantification for large language models (LLMs) can establish more human trust into their responses.<n>This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens.<n>We assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods.
arXiv Detail & Related papers (2024-12-19T11:10:36Z)
Graph-based Confidence Calibration for Large Language Models [22.394717844099684]
We propose a novel method to develop a well-calibrated confidence estimation model. We use a weighted graph to represent the consistency among the large language models' responses to a question. We then train a graph neural network to estimate the probability of correct responses.
arXiv Detail & Related papers (2024-11-03T20:36:44Z)
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales [29.33581578047835]
SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates. In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
arXiv Detail & Related papers (2024-05-31T16:21:16Z)
Calibrating Large Language Models Using Their Generations Only [44.26441565763495]
APRICOT is a method to set confidence targets and train an additional model that predicts an LLM's confidence based on its textual input and output alone. It is conceptually simple, does not require access to the target model beyond its output, does not interfere with the language generation, and has a multitude of potential usages. We show how our approach performs competitively in terms of calibration error for white-box and black-box LLMs on closed-book question-answering to detect incorrect LLM answers.
arXiv Detail & Related papers (2024-03-09T17:46:24Z)
What Large Language Models Know and What People Think They Know [13.939511057660013]
Large language models (LLMs) are increasingly integrated into decision-making processes. To earn human trust, LLMs must be well calibrated so that they can accurately assess and communicate the likelihood of their predictions being correct. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models' actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers.
arXiv Detail & Related papers (2024-01-24T22:21:04Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness [16.35655151252159]
We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model. Our uncertainty quantification technique works for any LLM accessible only via a black-box API.
arXiv Detail & Related papers (2023-08-30T17:53:25Z)
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback [91.22679548111127]
A trustworthy real-world prediction system should produce well-calibrated confidence scores. We show that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.
arXiv Detail & Related papers (2023-05-24T10:12:33Z)
A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty. We study the dynamic change in PLMs' calibration performance in training. We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z)
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.