Bayesian Neural Networks With Maximum Mean Discrepancy Regularization
- URL: http://arxiv.org/abs/2003.00952v2
- Date: Wed, 30 Sep 2020 09:56:44 GMT
- Title: Bayesian Neural Networks With Maximum Mean Discrepancy Regularization
- Authors: Jary Pomponi, Simone Scardapane, and Aurelio Uncini
- Abstract summary: We show that our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks.
We also provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks.
- Score: 13.97417198693205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Neural Networks (BNNs) are trained to optimize an entire
distribution over their weights instead of a single set, having significant
advantages in terms of, e.g., interpretability, multi-task learning, and
calibration. Because of the intractability of the resulting optimization
problem, most BNNs are either sampled through Monte Carlo methods, or trained
by minimizing a suitable Evidence Lower BOund (ELBO) on a variational
approximation. In this paper, we propose a variant of the latter, wherein we
replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean
Discrepancy (MMD) estimator, inspired by recent work in variational inference.
After motivating our proposal based on the properties of the MMD term, we
proceed to show a number of empirical advantages of the proposed formulation
over the state-of-the-art. In particular, our BNNs achieve higher accuracy on
multiple benchmarks, including several image classification tasks. In addition,
they are more robust to the selection of a prior over the weights, and they are
better calibrated. As a second contribution, we provide a new formulation for
estimating the uncertainty on a given prediction, showing it performs in a more
robust fashion against adversarial attacks and the injection of noise over
their inputs, compared to more classical criteria such as the differential
entropy.
Related papers
- Flat Posterior Does Matter For Bayesian Model Averaging [15.371686185626162]
In this work, we empirically demonstrate that BNNs often struggle to capture the flatness.
We propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel that seeks flat posteriors by calculating neural divergence.
We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift.
arXiv Detail & Related papers (2024-06-21T21:44:27Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Adaptive Dimension Reduction and Variational Inference for Transductive
Few-Shot Classification [2.922007656878633]
We propose a new clustering method based on Variational Bayesian inference, further improved by Adaptive Dimension Reduction.
Our proposed method significantly improves accuracy in the realistic unbalanced transductive setting on various Few-Shot benchmarks.
arXiv Detail & Related papers (2022-09-18T10:29:02Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.