Flat Seeking Bayesian Neural Networks
- URL: http://arxiv.org/abs/2302.02713v5
- Date: Mon, 6 Nov 2023 06:01:56 GMT
- Title: Flat Seeking Bayesian Neural Networks
- Authors: Van-Anh Nguyen, Tung-Long Vuong, Hoang Phan, Thanh-Toan Do, Dinh
Phung, Trung Le
- Abstract summary: We develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior.
Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness.
We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks.
- Score: 32.61417343756841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for
deep learning models by imposing a prior distribution over model parameters and
inferring a posterior distribution based on observed data. The model sampled
from the posterior distribution can be used for providing ensemble predictions
and quantifying prediction uncertainty. It is well-known that deep learning
models with lower sharpness have better generalization ability. However,
existing posterior inferences are not aware of sharpness/flatness in terms of
formulation, possibly leading to high sharpness for the models sampled from
them. In this paper, we develop theories, the Bayesian setting, and the
variational inference approach for the sharpness-aware posterior. Specifically,
the models sampled from our sharpness-aware posterior, and the optimal
approximate posterior estimating this sharpness-aware posterior, have better
flatness, hence possibly possessing higher generalization ability. We conduct
experiments by leveraging the sharpness-aware posterior with state-of-the-art
Bayesian Neural Networks, showing that the flat-seeking counterparts outperform
their baselines in all metrics of interest.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Flat Posterior Does Matter For Bayesian Model Averaging [15.371686185626162]
In this work, we empirically demonstrate that BNNs often struggle to capture the flatness.
We propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel that seeks flat posteriors by calculating neural divergence.
We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift.
arXiv Detail & Related papers (2024-06-21T21:44:27Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Do Bayesian Variational Autoencoders Know What They Don't Know? [0.6091702876917279]
The problem of detecting the Out-of-Distribution (OoD) inputs is paramount importance for Deep Neural Networks.
It has been previously shown that even Deep Generative Models that allow estimating the density of the inputs may not be reliable.
This paper investigates three approaches to inference: Markov chain Monte Carlo, Bayes gradient by Backpropagation and Weight Averaging-Gaussian.
arXiv Detail & Related papers (2022-12-29T11:48:01Z) - Non-Volatile Memory Accelerated Posterior Estimation [3.4256231429537936]
Current machine learning models use only a single learnable parameter combination when making predictions.
We show that through the use of high-capacity persistent storage, models whose posterior distribution was too big to approximate are now feasible.
arXiv Detail & Related papers (2022-02-21T20:25:57Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.