Related papers: Revisiting Softmax for Uncertainty Approximation in Text Classification

Revisiting Softmax for Uncertainty Approximation in Text Classification

URL: http://arxiv.org/abs/2210.14037v2
Date: Wed, 19 Jul 2023 13:43:07 GMT
Title: Revisiting Softmax for Uncertainty Approximation in Text Classification
Authors: Andreas Nugaard Holm, Dustin Wright, Isabelle Augenstein
Abstract summary: Uncertainty approximation in text classification is an important area with applications in domain adaptation and interpretability. One of the most widely used uncertainty approximation methods is Monte Carlo (MC) Dropout, which is computationally expensive. We compare softmax and an efficient version of MC Dropout on their uncertainty approximations and downstream text classification performance. We find that, while MC dropout produces the best uncertainty approximations, using a simple softmax leads to competitive and in some cases better uncertainty estimation for text classification at a much lower computational cost.
Score: 45.07154956156555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Uncertainty approximation in text classification is an important area with applications in domain adaptation and interpretability. One of the most widely used uncertainty approximation methods is Monte Carlo (MC) Dropout, which is computationally expensive as it requires multiple forward passes through the model. A cheaper alternative is to simply use the softmax based on a single forward pass without dropout to estimate model uncertainty. However, prior work has indicated that these predictions tend to be overconfident. In this paper, we perform a thorough empirical analysis of these methods on five datasets with two base neural architectures in order to identify the trade-offs between the two. We compare both softmax and an efficient version of MC Dropout on their uncertainty approximations and downstream text classification performance, while weighing their runtime (cost) against performance (benefit). We find that, while MC dropout produces the best uncertainty approximations, using a simple softmax leads to competitive and in some cases better uncertainty estimation for text classification at a much lower computational cost, suggesting that softmax can in fact be a sufficient uncertainty estimate when computational resources are a concern.

Related papers

A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts [66.39976432286905]
We study the convergence rates of the maximum likelihood estimator of gating and prompt parameters.<n>We find that the estimability of these parameters is compromised when the prompt acquires overlapping knowledge with the pre-trained model.
arXiv Detail & Related papers (2025-05-24T01:30:46Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions? [19.939014335673633]
We introduce a new measure called Softmax Correlation (SoftmaxCorr) It calculates the cosine similarity between a class-class correlation matrix and a predefined reference matrix. A high resemblance of predictions to the reference matrix signals that the model delivers confident and uniform predictions.
arXiv Detail & Related papers (2024-06-14T10:36:26Z)
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts [78.3687645289918]
We show that the sigmoid gating function enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation. We find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating.
arXiv Detail & Related papers (2024-05-22T21:12:34Z)
In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer [27.025808709031864]
It has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring. We show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax. We propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness.
arXiv Detail & Related papers (2023-11-02T09:15:52Z)
Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification [4.813254903898101]
logistic-softmax is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification. We revisit and redesign the logistic-softmax likelihood, which enables control of the textita priori confidence level through a temperature parameter. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets.
arXiv Detail & Related papers (2023-10-16T13:20:13Z)
Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z)
Understanding Softmax Confidence and Uncertainty [95.71801498763216]
It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this. This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with uncertainty.
arXiv Detail & Related papers (2021-06-09T10:37:29Z)
Improving Deterministic Uncertainty Estimation in Deep Learning for Classification and Regression [30.112634874443494]
We propose a new model that estimates uncertainty in a single forward pass. Our approach combines a bi-Lipschitz feature extractor with an inducing point approximate Gaussian process, offering robust and principled uncertainty estimation.
arXiv Detail & Related papers (2021-02-22T23:29:12Z)
Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards. We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Being Bayesian about Categorical Probability [6.875312133832079]
We consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label. Our method can be implemented as a plug-and-play loss function with negligible computational overhead.
arXiv Detail & Related papers (2020-02-19T02:35:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.