Data Subsampling for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2210.09141v1
- Date: Mon, 17 Oct 2022 14:43:35 GMT
- Title: Data Subsampling for Bayesian Neural Networks
- Authors: Eiji Kawasaki, Markus Holzmann
- Abstract summary: Penalty Bayesian Neural Networks - PBNNs - achieve good predictive performance for a given mini-batch size.
Varying the size of the mini-batches enables a natural calibration of the predictive distribution.
We expect PBNN to be particularly suited for cases when data sets are distributed across multiple decentralized devices.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Markov Chain Monte Carlo (MCMC) algorithms do not scale well for large
datasets leading to difficulties in Neural Network posterior sampling. In this
paper, we apply a generalization of the Metropolis Hastings algorithm that
allows us to restrict the evaluation of the likelihood to small mini-batches in
a Bayesian inference context. Since it requires the computation of a so-called
"noise penalty" determined by the variance of the training loss function over
the mini-batches, we refer to this data subsampling strategy as Penalty
Bayesian Neural Networks - PBNNs. Its implementation on top of MCMC is
straightforward, as the variance of the loss function merely reduces the
acceptance probability. Comparing to other samplers, we empirically show that
PBNN achieves good predictive performance for a given mini-batch size. Varying
the size of the mini-batches enables a natural calibration of the predictive
distribution and provides an inbuilt protection against overfitting. We expect
PBNN to be particularly suited for cases when data sets are distributed across
multiple decentralized devices as typical in federated learning.
Related papers
- Favour: FAst Variance Operator for Uncertainty Rating [0.034530027457862]
Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions.
By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference.
Previous work proposed propagating the first and second moments of the posterior directly through the network.
This method is even slower than sampling, so the propagated variance needs to be approximated.
Our contribution is a more principled variance propagation framework.
arXiv Detail & Related papers (2023-11-21T22:53:20Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference.
Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference.
GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z) - Approximate blocked Gibbs sampling for Bayesian neural networks [1.7259824817932292]
In this work, it is proposed to sample subgroups of parameters via a blocked Gibbs sampling scheme.
It is also possible to alleviate vanishing acceptance rates for increasing depth by reducing the proposal variance in deeper layers.
An open problem is how to perform minibatch MCMC sampling for feedforward neural networks in the presence of augmented data.
arXiv Detail & Related papers (2022-08-24T09:26:12Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Simpler Certified Radius Maximization by Propagating Covariances [39.851641822878996]
We show an algorithm for maximizing the certified radius on datasets including Cifar-10, ImageNet, and Places365.
We show how satisfying these criteria yields an algorithm for maximizing the certified radius on datasets with moderate depth, with a small compromise in overall accuracy.
arXiv Detail & Related papers (2021-04-13T01:38:36Z) - Rapid Risk Minimization with Bayesian Models Through Deep Learning
Approximation [9.93116974480156]
We introduce a novel combination of Bayesian Models (BMs) and Neural Networks (NNs) for making predictions with a minimum expected risk.
Our approach combines the data efficiency and interpretability of a BM with the speed of a NN.
We achieve risk minimized predictions significantly faster than standard methods with a negligible loss on the testing dataset.
arXiv Detail & Related papers (2021-03-29T15:08:25Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.