Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation
in Natural Language Generation
- URL: http://arxiv.org/abs/2302.09664v3
- Date: Sat, 15 Apr 2023 12:55:45 GMT
- Title: Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation
in Natural Language Generation
- Authors: Lorenz Kuhn, Yarin Gal, Sebastian Farquhar
- Abstract summary: We show that measuring uncertainty in natural language is challenging because of "semantic equivalence"
We introduce semantic entropy -- an entropy which incorporates linguistic invariances created by shared meanings.
Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models.
- Score: 37.37606905433334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a method to measure uncertainty in large language models. For
tasks like question answering, it is essential to know when we can trust the
natural language outputs of foundation models. We show that measuring
uncertainty in natural language is challenging because of "semantic
equivalence" -- different sentences can mean the same thing. To overcome these
challenges we introduce semantic entropy -- an entropy which incorporates
linguistic invariances created by shared meanings. Our method is unsupervised,
uses only a single model, and requires no modifications to off-the-shelf
language models. In comprehensive ablation studies we show that the semantic
entropy is more predictive of model accuracy on question answering data sets
than comparable baselines.
Related papers
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - Perceptions of Linguistic Uncertainty by Language Models and Humans [26.69714008538173]
We investigate how language models map linguistic expressions of uncertainty to numerical responses.
We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner.
This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge.
arXiv Detail & Related papers (2024-07-22T17:26:12Z) - On Subjective Uncertainty Quantification and Calibration in Natural Language Generation [2.622066970118316]
Large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging.
This work addresses these challenges from a perspective of Bayesian decision theory.
We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration.
The proposed methods can be applied to black-box language models.
arXiv Detail & Related papers (2024-06-07T18:54:40Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Distributional Semantics, Holism, and the Instability of Meaning [0.0]
A standard objection to meaning holism is the charge of instability.
In this article we examine whether the instability objection poses a problem for distributional models of meaning.
arXiv Detail & Related papers (2024-05-20T14:53:25Z) - How often are errors in natural language reasoning due to paraphrastic variability? [29.079188032623605]
We propose a metric for evaluating the paraphrastic consistency of natural language reasoning models.
We mathematically connect this metric to the proportion of a model's variance in correctness attributable to paraphrasing.
We collect ParaNLU, a dataset of 7,782 human-written and validated paraphrased reasoning problems.
arXiv Detail & Related papers (2024-04-17T20:11:32Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.