Calibrating Large Language Models Using Their Generations Only
- URL: http://arxiv.org/abs/2403.05973v1
- Date: Sat, 9 Mar 2024 17:46:24 GMT
- Title: Calibrating Large Language Models Using Their Generations Only
- Authors: Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh
- Abstract summary: APRICOT is a method to set confidence targets and train an additional model that predicts an LLM's confidence based on its textual input and output alone.
It is conceptually simple, does not require access to the target model beyond its output, does not interfere with the language generation, and has a multitude of potential usages.
We show how our approach performs competitively in terms of calibration error for white-box and black-box LLMs on closed-book question-answering to detect incorrect LLM answers.
- Score: 44.26441565763495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) are increasingly deployed in user-facing
applications, building trust and maintaining safety by accurately quantifying a
model's confidence in its prediction becomes even more important. However,
finding effective ways to calibrate LLMs - especially when the only interface
to the models is their generated text - remains a challenge. We propose APRICOT
(auxiliary prediction of confidence targets): A method to set confidence
targets and train an additional model that predicts an LLM's confidence based
on its textual input and output alone. This approach has several advantages: It
is conceptually simple, does not require access to the target model beyond its
output, does not interfere with the language generation, and has a multitude of
potential usages, for instance by verbalizing the predicted confidence or
adjusting the given answer based on the confidence. We show how our approach
performs competitively in terms of calibration error for white-box and
black-box LLMs on closed-book question-answering to detect incorrect LLM
answers.
Related papers
- On Calibration of LLM-based Guard Models for Reliable Content Moderation [27.611237252584402]
Large language models (LLMs) pose significant risks due to the potential for generating harmful content or users attempting to evade guardrails.
Existing studies have developed LLM-based guard models designed to moderate the input and output of threat LLMs.
However, limited attention has been given to the reliability and calibration of such guard models.
arXiv Detail & Related papers (2024-10-14T12:04:06Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales [29.33581578047835]
SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates.
In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge.
We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
arXiv Detail & Related papers (2024-05-31T16:21:16Z) - SPOT: Text Source Prediction from Originality Score Thresholding [6.790905400046194]
countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information.
Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust.
arXiv Detail & Related papers (2024-05-30T21:51:01Z) - Calibrating the Confidence of Large Language Models by Eliciting Fidelity [52.47397325111864]
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless.
Post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate.
We propose a plug-and-play method to estimate the confidence of language models.
arXiv Detail & Related papers (2024-04-03T11:36:12Z) - Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks.
We instruct an LLM to self-evaluate its answers.
We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z) - Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning.
This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation.
We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z) - Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models [37.63939774027709]
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities.
We propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment.
Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses.
arXiv Detail & Related papers (2023-05-30T16:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.