Precise characterization of the prior predictive distribution of deep
ReLU networks
- URL: http://arxiv.org/abs/2106.06615v1
- Date: Fri, 11 Jun 2021 21:21:52 GMT
- Title: Precise characterization of the prior predictive distribution of deep
ReLU networks
- Authors: Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas
Hofmann
- Abstract summary: We derive a precise characterization of the prior predictive distribution of finite-width ReLU networks with Gaussian weights.
Our results provide valuable guidance on prior design, for instance, controlling the predictive variance with depth- and width-informed priors on the weights of the network.
- Score: 45.46732383818331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on Bayesian neural networks (BNNs) have highlighted the need to
better understand the implications of using Gaussian priors in combination with
the compositional structure of the network architecture. Similar in spirit to
the kind of analysis that has been developed to devise better initialization
schemes for neural networks (cf. He- or Xavier initialization), we derive a
precise characterization of the prior predictive distribution of finite-width
ReLU networks with Gaussian weights. While theoretical results have been
obtained for their heavy-tailedness, the full characterization of the prior
predictive distribution (i.e. its density, CDF and moments), remained unknown
prior to this work. Our analysis, based on the Meijer-G function, allows us to
quantify the influence of architectural choices such as the width or depth of
the network on the resulting shape of the prior predictive distribution. We
also formally connect our results to previous work in the infinite width
setting, demonstrating that the moments of the distribution converge to those
of a normal log-normal mixture in the infinite depth limit. Finally, our
results provide valuable guidance on prior design: for instance, controlling
the predictive variance with depth- and width-informed priors on the weights of
the network.
Related papers
- Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Bayesian neural network priors for edge-preserving inversion [3.2046720177804646]
A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced.
We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite.
arXiv Detail & Related papers (2021-12-20T16:39:05Z) - Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical
Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies.
We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training.
We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z) - BNNpriors: A library for Bayesian neural network inference with
different prior distributions [32.944046414823916]
BNNpriors enables state-of-the-art Markov Chain Monte Carlo inference on Bayesian neural networks.
It follows a modular approach that eases the design and implementation of new custom priors.
It has facilitated foundational discoveries on the nature of the cold posterior effect in Bayesian neural networks.
arXiv Detail & Related papers (2021-05-14T17:11:04Z) - The Ridgelet Prior: A Covariance Function Approach to Prior
Specification for Bayesian Neural Networks [4.307812758854161]
We construct a prior distribution for the parameters of a network that approximates the posited Gaussian process in the output space of the network.
This establishes the property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular.
arXiv Detail & Related papers (2020-10-16T16:39:45Z) - Structured Weight Priors for Convolutional Neural Networks [74.1348917619643]
This paper explores the benefits of adding structure to weight priors.
It first considers first-layer filters of a convolutional NN, designing a prior based on random Gabor filters.
Empirical results suggest that these structured weight priors lead to more meaningful functional priors for image data.
arXiv Detail & Related papers (2020-07-12T13:05:51Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.