Posterior and variational inference for deep neural networks with heavy-tailed weights
- URL: http://arxiv.org/abs/2406.03369v1
- Date: Wed, 5 Jun 2024 15:24:20 GMT
- Title: Posterior and variational inference for deep neural networks with heavy-tailed weights
- Authors: Ismaƫl Castillo, Paul Egels,
- Abstract summary: We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates, simultaneously adaptive to both intrinsic dimension and smoothness of the underlying function, in a variety of contexts including nonparametric regression, geometric data and Besov spaces. While most works so far need a form of model selection built-in within the prior distribution, a key aspect of our approach is that it does not require to sample hyperparameters to learn the architecture of the network. We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
Related papers
- Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - Joint Bayesian Inference of Graphical Structure and Parameters with a
Single Generative Flow Network [59.79008107609297]
We propose in this paper to approximate the joint posterior over the structure of a Bayesian Network.
We use a single GFlowNet whose sampling policy follows a two-phase process.
Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models.
arXiv Detail & Related papers (2023-05-30T19:16:44Z) - Variational EP with Probabilistic Backpropagation for Bayesian Neural
Networks [0.0]
I propose a novel approach for nonlinear Logistic regression using a two-layer neural network (NN) model structure with hierarchical priors on the network weights.
I derive a computationally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse logistic models.
arXiv Detail & Related papers (2023-03-02T19:09:47Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Asymptotic Properties for Bayesian Neural Network in Besov Space [1.90365714903665]
We show that the Bayesian neural network using spike-and-slab prior consistency has nearly minimax convergence rate when the true regression function is in the Besov space.
We propose a practical neural network with guaranteed properties.
arXiv Detail & Related papers (2022-06-01T05:47:06Z) - Kalman Bayesian Neural Networks for Closed-form Online Learning [5.220940151628734]
We propose a novel approach for BNN learning via closed-form Bayesian inference.
The calculation of the predictive distribution of the output and the update of the weight distribution are treated as Bayesian filtering and smoothing problems.
This allows closed-form expressions for training the network's parameters in a sequential/online fashion without gradient descent.
arXiv Detail & Related papers (2021-10-03T07:29:57Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Probabilistic electric load forecasting through Bayesian Mixture Density
Networks [70.50488907591463]
Probabilistic load forecasting (PLF) is a key component in the extended tool-chain required for efficient management of smart energy grids.
We propose a novel PLF approach, framed on Bayesian Mixture Density Networks.
To achieve reliable and computationally scalable estimators of the posterior distributions, both Mean Field variational inference and deep ensembles are integrated.
arXiv Detail & Related papers (2020-12-23T16:21:34Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z) - Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight
Posterior Approximations [40.384018112884874]
We prove that deep mean-field variational weight posteriors can induce similar distributions in function-space to those induced by shallower networks.
Our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
arXiv Detail & Related papers (2020-02-10T13:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.