Related papers: Large Language Models Must Be Taught to Know What They Don't Know

Large Language Models Must Be Taught to Know What They Don't Know

URL: http://arxiv.org/abs/2406.08391v1
Date: Wed, 12 Jun 2024 16:41:31 GMT
Title: Large Language Models Must Be Taught to Know What They Don't Know
Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson,
Abstract summary: We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
Score: 97.90008709512921
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.

Related papers

The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration [5.616884466478886]
Pre-trained language models (PLMs) have enabled significant performance gains in the field of natural language processing. Recent studies have found PLMs to suffer from miscalibration, indicating a lack of accuracy in the confidence estimates provided by these models. This paper investigates whether lower calibration error implies reliable decision rules for a language model.
arXiv Detail & Related papers (2024-12-17T08:04:28Z)
Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
This study proposes using large language models (LLMs) to elicit expert prior distributions for predictive models. We compare LLM-elicited and uninformative priors, evaluate whether LLMs truthfully generate parameter distributions, and propose a model selection strategy for in-context learning and prior elicitation. Our findings show that LLM-elicited prior parameter distributions significantly reduce predictive error compared to uninformative priors in low-data settings.
arXiv Detail & Related papers (2024-11-26T10:13:39Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries [6.249216559519607]
We estimate the uncertainty of closed-source large language models via multiple rephrasings of an original base query. Our method demonstrates significant improvements in the calibration of uncertainty estimates compared to the baseline.
arXiv Detail & Related papers (2024-05-22T18:28:26Z)
Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach [23.34505448257966]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data. These data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data. We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data.
arXiv Detail & Related papers (2024-04-04T15:21:22Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Distinguishing the Knowable from the Unknowable with Language Models [15.471748481627143]
In the absence of ground-truth probabilities, we explore a setting where, in order to disentangle a given uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level. We propose a fully unsupervised method that achieves non-trivial accuracy on the same task.
arXiv Detail & Related papers (2024-02-05T22:22:49Z)
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z)
BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks [50.15201777970128]
We propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation. BayesCap is a memory-efficient method that can be trained on a small fraction of the original dataset. We show the efficacy of our method on a wide variety of tasks with a diverse set of architectures.
arXiv Detail & Related papers (2022-07-14T12:50:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.