Large Language Models Must Be Taught to Know What They Don't Know
- URL: http://arxiv.org/abs/2406.08391v1
- Date: Wed, 12 Jun 2024 16:41:31 GMT
- Title: Large Language Models Must Be Taught to Know What They Don't Know
- Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson,
- Abstract summary: We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
- Score: 97.90008709512921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
Related papers
- The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration [5.616884466478886]
Pre-trained language models (PLMs) have enabled significant performance gains in the field of natural language processing.
Recent studies have found PLMs to suffer from miscalibration, indicating a lack of accuracy in the confidence estimates provided by these models.
This paper investigates whether lower calibration error implies reliable decision rules for a language model.
arXiv Detail & Related papers (2024-12-17T08:04:28Z) - AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models.
We show these priors are informative and can be refined using natural language.
We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries [6.249216559519607]
We estimate the uncertainty of closed-source large language models via multiple rephrasings of an original base query.
Our method demonstrates significant improvements in the calibration of uncertainty estimates compared to the baseline.
arXiv Detail & Related papers (2024-05-22T18:28:26Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Distinguishing the Knowable from the Unknowable with Language Models [15.471748481627143]
In the absence of ground-truth probabilities, we explore a setting where, in order to disentangle a given uncertainty, a significantly larger model stands in as a proxy for the ground truth.
We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level.
We propose a fully unsupervised method that achieves non-trivial accuracy on the same task.
arXiv Detail & Related papers (2024-02-05T22:22:49Z) - Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models.
KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z) - BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen
Neural Networks [50.15201777970128]
We propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation.
BayesCap is a memory-efficient method that can be trained on a small fraction of the original dataset.
We show the efficacy of our method on a wide variety of tasks with a diverse set of architectures.
arXiv Detail & Related papers (2022-07-14T12:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.