Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
- URL: http://arxiv.org/abs/2002.10118v2
- Date: Fri, 17 Jul 2020 15:04:19 GMT
- Title: Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks
- Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig
- Abstract summary: We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
- Score: 65.24701908364383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The point estimates of ReLU classification networks---arguably the most
widely used neural network architecture---have been shown to yield arbitrarily
high confidence far away from the training data. This architecture, in
conjunction with a maximum a posteriori estimation scheme, is thus not
calibrated nor robust. Approximate Bayesian inference has been empirically
demonstrated to improve predictive uncertainty in neural networks, although the
theoretical analysis of such Bayesian approximations is limited. We
theoretically analyze approximate Gaussian distributions on the weights of ReLU
networks and show that they fix the overconfidence problem. Furthermore, we
show that even a simplistic, thus cheap, Bayesian approximation, also fixes
these issues. This indicates that a sufficient condition for a calibrated
uncertainty on a ReLU network is "to be a bit Bayesian". These theoretical
results validate the usage of last-layer Bayesian approximation and motivate a
range of a fidelity-cost trade-off. We further validate these findings
empirically via various standard experiments using common deep ReLU networks
and Laplace approximations.
Related papers
- Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Deep Anti-Regularized Ensembles provide reliable out-of-distribution
uncertainty quantification [4.750521042508541]
Deep ensemble often return overconfident estimates outside the training domain.
We show that an ensemble of networks with large weights fitting the training data are likely to meet these two objectives.
We derive a theoretical framework for this approach and show that the proposed optimization can be seen as a "water-filling" problem.
arXiv Detail & Related papers (2023-04-08T15:25:12Z) - Improved uncertainty quantification for neural networks with Bayesian
last layer [0.0]
Uncertainty quantification is an important task in machine learning.
We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z) - Variational Inference: Posterior Threshold Improves Network Clustering Accuracy in Sparse Regimes [2.5782420501870296]
This paper proposes a simple way to improve the variational inference method by hard thresholding the posterior of the community assignment after each iteration.
We show that the proposed method converges and can accurately recover the true community labels, even when the average node degree of the network is bounded.
arXiv Detail & Related papers (2023-01-12T00:24:54Z) - Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in
Deep Learning [24.3370326359959]
We propose to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks.
We theoretically validate that our approach mitigates overconfidence "far away" from the training data and empirically compare against state-of-the-art baselines on standard uncertainty quantification benchmarks.
arXiv Detail & Related papers (2021-11-05T15:52:48Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Uncertainty Quantification in Deep Residual Neural Networks [0.0]
Uncertainty quantification is an important and challenging problem in deep learning.
Previous methods rely on dropout layers which are not present in modern deep architectures or batch normalization which is sensitive to batch sizes.
We show that training residual networks using depth can be interpreted as a variational approximation to the posterior weights in neural networks.
arXiv Detail & Related papers (2020-07-09T16:05:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.