A statistically consistent measure of semantic uncertainty using Language Models
- URL: http://arxiv.org/abs/2502.00507v3
- Date: Sat, 24 May 2025 13:27:39 GMT
- Title: A statistically consistent measure of semantic uncertainty using Language Models
- Authors: Yi Liu,
- Abstract summary: We propose a novel measure of semantic uncertainty, semantic spectral entropy, that is statistically consistent under mild assumptions.<n>This measure is implemented through a straightforward algorithm that relies solely on standard, pretrained language models.
- Score: 3.4933610074113464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To address the challenge of quantifying uncertainty in the outputs generated by language models, we propose a novel measure of semantic uncertainty, semantic spectral entropy, that is statistically consistent under mild assumptions. This measure is implemented through a straightforward algorithm that relies solely on standard, pretrained language models, without requiring access to the internal generation process. Our approach imposes minimal constraints on the choice of language models, making it broadly applicable across different architectures and settings. Through comprehensive simulation studies, we demonstrate that the proposed method yields an accurate and robust estimate of semantic uncertainty, even in the presence of the inherent randomness characteristic of generative language model outputs.
Related papers
- Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings [11.33157177182775]
Accurately quantifying uncertainty in large language models (LLMs) is crucial for their reliable deployment.
Current state-of-the-art methods for measuring semantic uncertainty in LLMs rely on strict bidirectional entailment criteria.
We propose a novel approach that leverages semantic embeddings to achieve smoother and more robust estimation of semantic uncertainty.
arXiv Detail & Related papers (2024-10-30T04:41:46Z) - On Uncertainty In Natural Language Processing [2.5076643086429993]
This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective.
We propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction.
Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors.
arXiv Detail & Related papers (2024-10-04T14:08:02Z) - Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence.
We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z) - Modelled Multivariate Overlap: A method for measuring vowel merger [0.0]
This paper introduces a novel method for quantifying vowel overlap.
We evaluate this method on corpus speech data targeting the PIN-PEN merger in four dialects of English.
arXiv Detail & Related papers (2024-06-24T04:56:26Z) - On Subjective Uncertainty Quantification and Calibration in Natural Language Generation [2.622066970118316]
Large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging.
This work addresses these challenges from a perspective of Bayesian decision theory.
We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration.
The proposed methods can be applied to black-box language models.
arXiv Detail & Related papers (2024-06-07T18:54:40Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation [12.921225188504643]
We propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses.<n> Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training.
arXiv Detail & Related papers (2024-05-10T12:14:11Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation
in Natural Language Generation [37.37606905433334]
We show that measuring uncertainty in natural language is challenging because of "semantic equivalence"
We introduce semantic entropy -- an entropy which incorporates linguistic invariances created by shared meanings.
Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models.
arXiv Detail & Related papers (2023-02-19T20:10:07Z) - Uncertainty Quantification for Rule-Based Models [0.03807314298073299]
Rule-based classification models directly predict values, rather than modeling a probability and translating it into a prediction as done in statistical models.
We propose an uncertainty quantification framework in the form of a meta-model that takes any binary classifier with binary output as a black box and estimates the prediction accuracy of that base model at a given input along with a level of confidence on that estimation.
arXiv Detail & Related papers (2022-11-03T15:50:09Z) - Dense Uncertainty Estimation via an Ensemble-based Conditional Latent
Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model.
We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation.
Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z) - Disentangling Generative Factors in Natural Language with Discrete
Variational Autoencoders [0.0]
We argue that continuous variables may not be ideal to model features of textual data, due to the fact that most generative factors in text are discrete.
We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations.
arXiv Detail & Related papers (2021-09-15T09:10:05Z) - Empowering Language Understanding with Counterfactual Reasoning [141.48592718583245]
We propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples.
In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples.
arXiv Detail & Related papers (2021-06-06T06:36:52Z) - Calibrating Over-Parametrized Simulation Models: A Framework via
Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees.
We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.