One Simple Trick to Fix Your Bayesian Neural Network
- URL: http://arxiv.org/abs/2207.13167v1
- Date: Tue, 26 Jul 2022 19:45:36 GMT
- Title: One Simple Trick to Fix Your Bayesian Neural Network
- Authors: Piotr Tempczyk, Ksawery Smoczy\'nski, Philip Smolenski-Jensen and
Marek Cygan
- Abstract summary: We show that neural networks with ReLU activation function induce posteriors that are hard to fit with MFVI.
We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.
- Score: 0.7955313479061443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most popular estimation methods in Bayesian neural networks (BNN)
is mean-field variational inference (MFVI). In this work, we show that neural
networks with ReLU activation function induce posteriors, that are hard to fit
with MFVI. We provide a theoretical justification for this phenomenon, study it
empirically, and report the results of a series of experiments to investigate
the effect of activation function on the calibration of BNNs. We find that
using Leaky ReLU activations leads to more Gaussian-like weight posteriors and
achieves a lower expected calibration error (ECE) than its ReLU-based
counterpart.
Related papers
- Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient [3.823356975862005]
We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate.
We then verify our theoretical findings through empirical simulations as well as experiments on a real-world dataset.
arXiv Detail & Related papers (2024-10-29T14:23:42Z) - ReLUs Are Sufficient for Learning Implicit Neural Representations [17.786058035763254]
We revisit the use of ReLU activation functions for learning implicit neural representations.
Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN)
We demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons.
arXiv Detail & Related papers (2024-06-04T17:51:08Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Sparsifying Bayesian neural networks with latent binary variables and
normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method.
Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm.
More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Modeling the Nonsmoothness of Modern Neural Networks [35.93486244163653]
We quantify the nonsmoothness using a feature named the sum of the magnitude of peaks (SMP)
We envision that the nonsmoothness feature can potentially be used as a forensic tool for regression-based applications of neural networks.
arXiv Detail & Related papers (2021-03-26T20:55:19Z) - Stochastic Bayesian Neural Networks [0.0]
We build on variational inference techniques for bayesian neural networks using the original Evidence Lower Bound.
We present a bayesian neural network in which we maximize Evidence Lower Bound using a new objective function which we name as Evidence Lower Bound.
arXiv Detail & Related papers (2020-08-12T19:48:34Z) - Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.