Related papers: One Simple Trick to Fix Your Bayesian Neural Network

One Simple Trick to Fix Your Bayesian Neural Network

URL: http://arxiv.org/abs/2207.13167v1
Date: Tue, 26 Jul 2022 19:45:36 GMT
Title: One Simple Trick to Fix Your Bayesian Neural Network
Authors: Piotr Tempczyk, Ksawery Smoczy\'nski, Philip Smolenski-Jensen and Marek Cygan
Abstract summary: We show that neural networks with ReLU activation function induce posteriors that are hard to fit with MFVI. We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.
Score: 0.7955313479061443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the most popular estimation methods in Bayesian neural networks (BNN) is mean-field variational inference (MFVI). In this work, we show that neural networks with ReLU activation function induce posteriors, that are hard to fit with MFVI. We provide a theoretical justification for this phenomenon, study it empirically, and report the results of a series of experiments to investigate the effect of activation function on the calibration of BNNs. We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.

Related papers

Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient [3.823356975862005]
We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate. We then verify our theoretical findings through empirical simulations as well as experiments on a real-world dataset.
arXiv Detail & Related papers (2024-10-29T14:23:42Z)
ReLUs Are Sufficient for Learning Implicit Neural Representations [17.786058035763254]
We revisit the use of ReLU activation functions for learning implicit neural representations. Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) We demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons.
arXiv Detail & Related papers (2024-06-04T17:51:08Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Sparsifying Bayesian neural networks with latent binary variables and normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method. Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm. More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Modeling the Nonsmoothness of Modern Neural Networks [35.93486244163653]
We quantify the nonsmoothness using a feature named the sum of the magnitude of peaks (SMP) We envision that the nonsmoothness feature can potentially be used as a forensic tool for regression-based applications of neural networks.
arXiv Detail & Related papers (2021-03-26T20:55:19Z)
Stochastic Bayesian Neural Networks [0.0]
We build on variational inference techniques for bayesian neural networks using the original Evidence Lower Bound. We present a bayesian neural network in which we maximize Evidence Lower Bound using a new objective function which we name as Evidence Lower Bound.
arXiv Detail & Related papers (2020-08-12T19:48:34Z)
Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior. For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated" We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.