Related papers: Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

URL: http://arxiv.org/abs/2112.13776v1
Date: Mon, 27 Dec 2021 16:43:31 GMT
Title: Transformer Uncertainty Estimation with Hierarchical Stochastic Attention
Authors: Jiahuan Pei, Cheng Wang, Gy\"orgy Szarvas
Abstract summary: We propose a novel way to enable transformers to have the capability of uncertainty estimation. This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
Score: 8.95459272947319
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention approximation by sampling from a Gumbel distribution is upper bounded. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets. The experimental results demonstrate that our approach: (1) achieves the best predictive performance and uncertainty trade-off among compared methods; (2) exhibits very competitive (in most cases, improved) predictive performance on ID datasets; (3) is on par with Monte Carlo dropout and ensemble methods in uncertainty estimation on OOD datasets.

Related papers

UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection [18.25576487115016]
This paper focuses on Human-Object Interaction (HOI) detection. It addresses the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. We propose a novel approach textscUAHOI, Uncertainty-aware Robust Human-Object Interaction Learning.
arXiv Detail & Related papers (2024-08-14T10:06:39Z)
Boosted Control Functions [10.503777692702952]
This work aims to bridge the gap between causal effect estimation and prediction tasks. We establish a novel connection between the field of distribution from machine learning, and simultaneous equation models and control function from econometrics. Within this framework, we propose a strong notion of invariance for a predictive model and compare it with existing (weaker) versions.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Multiclass Alignment of Confidence and Certainty for Network Calibration [10.15706847741555]
Recent studies reveal that deep neural networks (DNNs) are prone to making overconfident predictions. We propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC) Our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions.
arXiv Detail & Related papers (2023-09-06T00:56:24Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)
Birds of a Feather Trust Together: Knowing When to Trust a Classifier via Adaptive Neighborhood Aggregation [30.34223543030105]
We show how NeighborAgg can leverage the two essential information via an adaptive neighborhood aggregation. We also extend our approach to the closely related task of mislabel detection and provide a theoretical coverage guarantee to bound the false negative.
arXiv Detail & Related papers (2022-11-29T18:43:15Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
Variational Encoder-based Reliable Classification [5.161531917413708]
We propose an Epistemic (EC) that can provide justification of its belief using support from the training dataset as well as quality of reconstruction. Our approach is based on modified variational autoencoders that can identify a semantically meaningful low-dimensional space. Our results demonstrate improved reliability of predictions and robust identification of samples with adversarial attacks.
arXiv Detail & Related papers (2020-02-19T17:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.