Uncertainty Determines the Adequacy of the Mode and the Tractability of
Decoding in Sequence-to-Sequence Models
- URL: http://arxiv.org/abs/2204.00471v1
- Date: Fri, 1 Apr 2022 14:30:19 GMT
- Title: Uncertainty Determines the Adequacy of the Mode and the Tractability of
Decoding in Sequence-to-Sequence Models
- Authors: Felix Stahlberg, Ilia Kulikov and Shankar Kumar
- Abstract summary: We analyze how ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models.
We show that well-known pathologies such as a high number of beam search errors, the inadequacy of the mode, and the drop in system performance with large beam sizes apply to tasks with high level of ambiguity.
- Score: 11.258630552727432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many natural language processing (NLP) tasks the same input (e.g. source
sentence) can have multiple possible outputs (e.g. translations). To analyze
how this ambiguity (also known as intrinsic uncertainty) shapes the
distribution learned by neural sequence models we measure sentence-level
uncertainty by computing the degree of overlap between references in
multi-reference test sets from two different NLP tasks: machine translation
(MT) and grammatical error correction (GEC). At both the sentence- and the
task-level, intrinsic uncertainty has major implications for various aspects of
search such as the inductive biases in beam search and the complexity of exact
search. In particular, we show that well-known pathologies such as a high
number of beam search errors, the inadequacy of the mode, and the drop in
system performance with large beam sizes apply to tasks with high level of
ambiguity such as MT but not to less uncertain tasks such as GEC. Furthermore,
we propose a novel exact $n$-best search algorithm for neural sequence models,
and show that intrinsic uncertainty affects model uncertainty as the model
tends to overly spread out the probability mass for uncertain tasks and
sentences.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Towards frugal unsupervised detection of subtle abnormalities in medical
imaging [0.0]
Anomaly detection in medical imaging is a challenging task in contexts where abnormalities are not annotated.
We investigate mixtures of probability distributions whose versatility has been widely recognized.
This online approach is illustrated on the challenging detection of subtle abnormalities in MR brain scans for the follow-up of newly diagnosed Parkinsonian patients.
arXiv Detail & Related papers (2023-09-04T07:44:54Z) - A Non-monotonic Self-terminating Language Model [62.93465126911921]
In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm.
We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling.
We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
arXiv Detail & Related papers (2022-10-03T00:28:44Z) - Marginal Inference queries in Hidden Markov Models under context-free
grammar constraints [0.348097307252416]
We address the question of computing the likelihood of context-free grammars (CFGs) in Hidden Models (HMMs)
We show that the problem is NP-Hard, even with the promise that CFG has a degree of ambiguity less than or equal to 2.
We then propose a fully randomized approximation scheme to approximate the likelihood for the case of ambiguous CFGs.
arXiv Detail & Related papers (2022-06-26T12:44:18Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Acquisition-invariant brain MRI segmentation with informative
uncertainties [3.46329153611365]
Post-hoc multi-site correction methods exist but have strong assumptions that often do not hold in real-world scenarios.
This body of work showcases such an algorithm, that can become robust to the physics of acquisition in the context of segmentation tasks.
We demonstrate that our method not only generalises to complete holdout datasets, preserving segmentation quality, but does so while also accounting for site-specific sequence choices.
arXiv Detail & Related papers (2021-11-07T13:58:04Z) - NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples.
We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.