Split personalities in Bayesian Neural Networks: the case for full
marginalisation
- URL: http://arxiv.org/abs/2205.11151v1
- Date: Mon, 23 May 2022 09:24:37 GMT
- Title: Split personalities in Bayesian Neural Networks: the case for full
marginalisation
- Authors: David Yallup, Will Handley, Mike Hobson, Anthony Lasenby, Pablo Lemos
- Abstract summary: We show that the true posterior distribution of a Bayesian neural network is massively multimodal.
It is only by fully marginalising over all posterior modes, using appropriate Bayesian sampling tools, that we can capture the split personalities of the network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The true posterior distribution of a Bayesian neural network is massively
multimodal. Whilst most of these modes are functionally equivalent, we
demonstrate that there remains a level of real multimodality that manifests in
even the simplest neural network setups. It is only by fully marginalising over
all posterior modes, using appropriate Bayesian sampling tools, that we can
capture the split personalities of the network. The ability of a network
trained in this manner to reason between multiple candidate solutions
dramatically improves the generalisability of the model, a feature we contend
is not consistently captured by alternative approaches to the training of
Bayesian neural networks. We provide a concise minimal example of this, which
can provide lessons and a future path forward for correctly utilising the
explainability and interpretability of Bayesian neural networks.
Related papers
- Efficient Model Compression for Bayesian Neural Networks [4.179545514579061]
We demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup.
We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data.
arXiv Detail & Related papers (2024-11-01T00:07:59Z) - Bayesian Sheaf Neural Networks [1.0992151305603266]
Equipping graph neural networks with a convolution operation defined in terms of a cellular sheaf offers advantages for learning expressive representations of heterophilic graph data.
We propose a variational approach to learning cellular sheaves within sheaf neural networks, yielding an architecture we refer to as a Bayesian sheaf neural network.
arXiv Detail & Related papers (2024-10-12T16:46:48Z) - LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Structured Partial Stochasticity in Bayesian Neural Networks [0.0]
I propose a structured way to select the deterministic subset of weights that removes neuron permutation symmetries, and therefore the corresponding redundant posterior modes.
With a drastically simplified posterior distribution, the performance of existing approximate inference schemes is found to be greatly improved.
arXiv Detail & Related papers (2024-05-27T21:40:31Z) - On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors [2.3265565167163906]
Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks.
generating samples from the posterior distribution of neural networks is a major challenge.
One advance in that direction would be the incorporation of adaptive step sizes into Monte Carlo Markov chain sampling algorithms.
In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
arXiv Detail & Related papers (2024-03-13T15:21:14Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Bayesian Neural Networks: Essentials [0.6091702876917281]
It is nontrivial to understand, design and train Bayesian neural networks due to their complexities.
Deep neural networks makes it redundant, and costly, to account for uncertainty for a large number of successive layers.
Hybrid Bayesian neural networks, which use few probabilistic layers judicially positioned in the networks, provide a practical solution.
arXiv Detail & Related papers (2021-06-22T13:54:17Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.