Revisiting Softmax for Uncertainty Approximation in Text Classification
- URL: http://arxiv.org/abs/2210.14037v2
- Date: Wed, 19 Jul 2023 13:43:07 GMT
- Title: Revisiting Softmax for Uncertainty Approximation in Text Classification
- Authors: Andreas Nugaard Holm, Dustin Wright, Isabelle Augenstein
- Abstract summary: Uncertainty approximation in text classification is an important area with applications in domain adaptation and interpretability.
One of the most widely used uncertainty approximation methods is Monte Carlo (MC) Dropout, which is computationally expensive.
We compare softmax and an efficient version of MC Dropout on their uncertainty approximations and downstream text classification performance.
We find that, while MC dropout produces the best uncertainty approximations, using a simple softmax leads to competitive and in some cases better uncertainty estimation for text classification at a much lower computational cost.
- Score: 45.07154956156555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncertainty approximation in text classification is an important area with
applications in domain adaptation and interpretability. One of the most widely
used uncertainty approximation methods is Monte Carlo (MC) Dropout, which is
computationally expensive as it requires multiple forward passes through the
model. A cheaper alternative is to simply use the softmax based on a single
forward pass without dropout to estimate model uncertainty. However, prior work
has indicated that these predictions tend to be overconfident. In this paper,
we perform a thorough empirical analysis of these methods on five datasets with
two base neural architectures in order to identify the trade-offs between the
two. We compare both softmax and an efficient version of MC Dropout on their
uncertainty approximations and downstream text classification performance,
while weighing their runtime (cost) against performance (benefit). We find
that, while MC dropout produces the best uncertainty approximations, using a
simple softmax leads to competitive and in some cases better uncertainty
estimation for text classification at a much lower computational cost,
suggesting that softmax can in fact be a sufficient uncertainty estimate when
computational resources are a concern.
Related papers
- What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions? [19.939014335673633]
We introduce a new measure called Softmax Correlation (SoftmaxCorr)
It calculates the cosine similarity between a class-class correlation matrix and a predefined reference matrix.
A high resemblance of predictions to the reference matrix signals that the model delivers confident and uniform predictions.
arXiv Detail & Related papers (2024-06-14T10:36:26Z) - Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts [78.3687645289918]
We show that the sigmoid gating function enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation.
We find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating.
arXiv Detail & Related papers (2024-05-22T21:12:34Z) - In Defense of Softmax Parametrization for Calibrated and Consistent
Learning to Defer [27.025808709031864]
It has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring.
We show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax.
We propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness.
arXiv Detail & Related papers (2023-11-02T09:15:52Z) - Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification [4.813254903898101]
logistic-softmax is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification.
We revisit and redesign the logistic-softmax likelihood, which enables control of the textita priori confidence level through a temperature parameter.
Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets.
arXiv Detail & Related papers (2023-10-16T13:20:13Z) - Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications.
In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework.
In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z) - Understanding Softmax Confidence and Uncertainty [95.71801498763216]
It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution.
Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this.
This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with uncertainty.
arXiv Detail & Related papers (2021-06-09T10:37:29Z) - Improving Deterministic Uncertainty Estimation in Deep Learning for
Classification and Regression [30.112634874443494]
We propose a new model that estimates uncertainty in a single forward pass.
Our approach combines a bi-Lipschitz feature extractor with an inducing point approximate Gaussian process, offering robust and principled uncertainty estimation.
arXiv Detail & Related papers (2021-02-22T23:29:12Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Being Bayesian about Categorical Probability [6.875312133832079]
We consider a random variable of a categorical probability over class labels.
In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label.
Our method can be implemented as a plug-and-play loss function with negligible computational overhead.
arXiv Detail & Related papers (2020-02-19T02:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.