SPUQ: Perturbation-Based Uncertainty Quantification for Large Language
Models
- URL: http://arxiv.org/abs/2403.02509v1
- Date: Mon, 4 Mar 2024 21:55:22 GMT
- Title: SPUQ: Perturbation-Based Uncertainty Quantification for Large Language
Models
- Authors: Xiang Gao, Jiaxin Zhang, Lalla Mouatadid, Kamalika Das
- Abstract summary: Large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities.
A pressing challenge is their tendency to make confidently wrong predictions.
We introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties.
Our findings show a substantial improvement in model calibration, with a reduction in Expected Error (ECE) by 50% on average.
- Score: 9.817185255633758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, large language models (LLMs) have become increasingly
prevalent, offering remarkable text generation capabilities. However, a
pressing challenge is their tendency to make confidently wrong predictions,
highlighting the critical need for uncertainty quantification (UQ) in LLMs.
While previous works have mainly focused on addressing aleatoric uncertainty,
the full spectrum of uncertainties, including epistemic, remains inadequately
explored. Motivated by this gap, we introduce a novel UQ method, sampling with
perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic
uncertainties. The method entails generating a set of perturbations for LLM
inputs, sampling outputs for each perturbation, and incorporating an
aggregation module that generalizes the sampling uncertainty approach for text
generation tasks. Through extensive experiments on various datasets, we
investigated different perturbation and aggregation techniques. Our findings
show a substantial improvement in model uncertainty calibration, with a
reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings
suggest that our proposed UQ method offers promising steps toward enhancing the
reliability and trustworthiness of LLMs.
Related papers
- CLUE: Concept-Level Uncertainty Estimation for Large Language Models [49.92690111618016]
We propose a novel framework for Concept-Level Uncertainty Estimation for Large Language Models (LLMs)
We leverage LLMs to convert output sequences into concept-level representations, breaking down sequences into individual concepts and measuring the uncertainty of each concept separately.
We conduct experiments to demonstrate that CLUE can provide more interpretable uncertainty estimation results compared with sentence-level uncertainty.
arXiv Detail & Related papers (2024-09-04T18:27:12Z) - Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence.
We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z) - Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach [6.209293868095268]
We study the problem of uncertainty estimation and calibration for LLMs.
We propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses.
Our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.
arXiv Detail & Related papers (2024-04-24T17:10:35Z) - Language Model Cascades: Token-level uncertainty and beyond [65.38515344964647]
Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks.
Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs.
We show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform simple aggregation strategies.
arXiv Detail & Related papers (2024-04-15T21:02:48Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Uncertainty Quantification for Traffic Forecasting: A Unified Approach [21.556559649467328]
Uncertainty is an essential consideration for time series forecasting tasks.
In this work, we focus on quantifying the uncertainty of traffic forecasting.
We develop Deep S-Temporal Uncertainty Quantification (STUQ), which can estimate both aleatoric and relational uncertainty.
arXiv Detail & Related papers (2022-08-11T15:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.