Related papers: Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models

Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models

URL: http://arxiv.org/abs/2505.11731v2
Date: Fri, 23 May 2025 18:24:43 GMT
Title: Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models
Authors: Harshil Vejendla, Haizhou Shi, Yibin Wang, Tunyu Zhang, Huan Zhang, Hao Wang,
Abstract summary: In this paper, we investigate the possibility of eliminating the need for test-time sampling for uncertainty estimation.<n>We distill an off-the-shelf Bayesian LLM into a non-Bayesian student LLM by minimizing the divergence between their predictive distributions.<n>Our experiments demonstrate that uncertainty estimation capabilities on training data can successfully generalize to unseen test data.
Score: 12.69571386421462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in uncertainty estimation for Large Language Models (LLMs) during downstream adaptation have addressed key challenges of reliability and simplicity. However, existing Bayesian methods typically require multiple sampling iterations during inference, creating significant efficiency issues that limit practical deployment. In this paper, we investigate the possibility of eliminating the need for test-time sampling for LLM uncertainty estimation. Specifically, when given an off-the-shelf Bayesian LLM, we distill its aligned confidence into a non-Bayesian student LLM by minimizing the divergence between their predictive distributions. Unlike typical calibration methods, our distillation is carried out solely on the training dataset without the need of an additional validation dataset. This simple yet effective approach achieves N-times more efficient uncertainty estimation during testing, where N is the number of samples traditionally required by Bayesian LLMs. Our extensive experiments demonstrate that uncertainty estimation capabilities on training data can successfully generalize to unseen test data through our distillation technique, consistently producing results comparable to (or even better than) state-of-the-art Bayesian LLMs.

Related papers

Efficient Uncertainty in LLMs through Evidential Knowledge Distillation [3.864321514889099]
We introduce a novel approach enabling efficient and effective uncertainty estimation in LLMs without sacrificing performance.<n>We distill uncertainty-aware teacher models into compact student models sharing the same architecture but fine-tuned using Low-Rank Adaptation (LoRA)<n> Empirical evaluations on classification datasets demonstrate that such students can achieve comparable or superior predictive and uncertainty quantification performance.
arXiv Detail & Related papers (2025-07-24T12:46:40Z)
An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation [7.018119896897734]
Large language models (LLMs) often behave inconsistently across inputs, indicating uncertainty and motivating the need for its quantification in high-stakes settings.<n>We propose MUSE (Multi-LLM Uncertainty via Subset Ensembles), a simple information-theoretic method that uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs.<n>Experiments on binary prediction tasks demonstrate improved calibration and predictive performance compared to single-model and naive ensemble baselines.
arXiv Detail & Related papers (2025-07-09T19:13:25Z)
Uncertainty Quantification for LLM-Based Survey Simulations [9.303339416902995]
We investigate the use of large language models (LLMs) to simulate human responses to survey questions.<n>Our approach converts imperfect LLM-simulated responses into confidence sets for population parameters.<n>A key innovation lies in determining the optimal number of simulated responses.
arXiv Detail & Related papers (2025-02-25T02:07:29Z)
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models [13.953203993774233]
Large Language Models (LLMs) often suffer from overconfidence during inference.<n>We propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters.<n>Our results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2024-06-17T15:55:38Z)
Self-Knowledge Distillation for Learning Ambiguity [11.755814660833549]
Recent language models often over-confidently predict a single label without consideration for its correctness. We propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately. We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions.
arXiv Detail & Related papers (2024-06-14T05:11:32Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications. We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields [5.396675151318325]
Model ensembles are effective tools for estimating prediction uncertainty in deep learning atomistic force fields. However, their widespread adoption is hindered by high computational costs and overconfident error estimates. We address these challenges by leveraging distributions of per-sample errors obtained during training and employing a distance-based similarity search in the model latent space. Our method, which we call LTAU, efficiently estimates the full probability distribution function (PDF) of errors for any test point using the logged training errors.
arXiv Detail & Related papers (2024-02-01T18:50:42Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Making Pre-trained Language Models both Task-solvers and Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems. Previous work shows that introducing an extra calibration task can mitigate this issue. We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z)
Locally Valid and Discriminative Confidence Intervals for Deep Learning Models [37.57296694423751]
Uncertainty information should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high) Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. We propose Locally Valid and Discriminative confidence intervals (LVD), a simple, efficient and lightweight method to construct discriminative confidence intervals (CIs) for almost any deep learning model.
arXiv Detail & Related papers (2021-06-01T04:39:56Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.