Uncertainty Estimation of Transformers' Predictions via Topological
Analysis of the Attention Matrices
- URL: http://arxiv.org/abs/2308.11295v1
- Date: Tue, 22 Aug 2023 09:17:45 GMT
- Title: Uncertainty Estimation of Transformers' Predictions via Topological
Analysis of the Attention Matrices
- Authors: Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev
- Abstract summary: We set the task of obtaining an uncertainty estimate for neural networks based on the Transformer architecture.
In this paper, we propose a method for uncertainty estimation based on the topological properties of the attention mechanism.
- Score: 3.536472734238452
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Determining the degree of confidence of deep learning model in its prediction
is an open problem in the field of natural language processing. Most of the
classical methods for uncertainty estimation are quite weak for text
classification models. We set the task of obtaining an uncertainty estimate for
neural networks based on the Transformer architecture. A key feature of such
mo-dels is the attention mechanism, which supports the information flow between
the hidden representations of tokens in the neural network. We explore the
formed relationships between internal representations using Topological Data
Analysis methods and utilize them to predict model's confidence. In this paper,
we propose a method for uncertainty estimation based on the topological
properties of the attention mechanism and compare it with classical methods. As
a result, the proposed algorithm surpasses the existing methods in quality and
opens up a new area of application of the attention mechanism, but requires the
selection of topological features.
Related papers
- Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - kNN Algorithm for Conditional Mean and Variance Estimation with
Automated Uncertainty Quantification and Variable Selection [8.429136647141487]
We introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models.
This method focuses on accurately estimating the conditional mean and variance of random response variables.
It is particularly notable in biomedical applications as demonstrated in two case studies.
arXiv Detail & Related papers (2024-02-02T18:54:18Z) - Tractable Function-Space Variational Inference in Bayesian Neural
Networks [72.97620734290139]
A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters.
We propose a scalable function-space variational inference method that allows incorporating prior information.
We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks.
arXiv Detail & Related papers (2023-12-28T18:33:26Z) - A new approach to generalisation error of machine learning algorithms:
Estimates and convergence [0.0]
We introduce a new approach to the estimation of the (generalisation) error and to convergence.
Our results include estimates of the error without any structural assumption on the neural networks.
arXiv Detail & Related papers (2023-06-23T20:57:31Z) - Confidence estimation of classification based on the distribution of the
neural network output layer [4.529188601556233]
One of the most common problems preventing the application of prediction models in the real world is lack of generalization.
We propose novel methods that estimate uncertainty of particular predictions generated by a neural network classification model.
The proposed methods infer the confidence of a particular prediction based on the distribution of the logit values corresponding to this prediction.
arXiv Detail & Related papers (2022-10-14T12:32:50Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Probabilistic Deep Learning for Instance Segmentation [9.62543698736491]
We propose a generic method to obtain model-inherent uncertainty estimates within proposal-free instance segmentation models.
We evaluate our method on the BBBC010 C. elegans dataset, where it yields competitive performance.
arXiv Detail & Related papers (2020-08-24T19:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.