Structured Weight Priors for Convolutional Neural Networks
- URL: http://arxiv.org/abs/2007.14235v1
- Date: Sun, 12 Jul 2020 13:05:51 GMT
- Title: Structured Weight Priors for Convolutional Neural Networks
- Authors: Tim Pearce, Andrew Y.K. Foong, Alexandra Brintrup
- Abstract summary: This paper explores the benefits of adding structure to weight priors.
It first considers first-layer filters of a convolutional NN, designing a prior based on random Gabor filters.
Empirical results suggest that these structured weight priors lead to more meaningful functional priors for image data.
- Score: 74.1348917619643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selection of an architectural prior well suited to a task (e.g. convolutions
for image data) is crucial to the success of deep neural networks (NNs).
Conversely, the weight priors within these architectures are typically left
vague, e.g.~independent Gaussian distributions, which has led to debate over
the utility of Bayesian deep learning. This paper explores the benefits of
adding structure to weight priors. It initially considers first-layer filters
of a convolutional NN, designing a prior based on random Gabor filters. Second,
it considers adding structure to the prior of final-layer weights by estimating
how each hidden feature relates to each class. Empirical results suggest that
these structured weight priors lead to more meaningful functional priors for
image data. This contributes to the ongoing discussion on the importance of
weight priors.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Understanding the Covariance Structure of Convolutional Filters [86.0964031294896]
Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure.
We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
arXiv Detail & Related papers (2022-10-07T15:59:13Z) - Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network.
We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection.
We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z) - Precise characterization of the prior predictive distribution of deep
ReLU networks [45.46732383818331]
We derive a precise characterization of the prior predictive distribution of finite-width ReLU networks with Gaussian weights.
Our results provide valuable guidance on prior design, for instance, controlling the predictive variance with depth- and width-informed priors on the weights of the network.
arXiv Detail & Related papers (2021-06-11T21:21:52Z) - Bayesian Neural Network Priors Revisited [29.949163519715952]
We study summary statistics of neural network weights in different networks trained using SGD.
We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations.
arXiv Detail & Related papers (2021-02-12T15:18:06Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights [16.538973310830414]
A desirable class of priors would represent weights compactly, capture correlations between weights, and allow inclusion of prior knowledge.
This paper introduces two innovations: (i) a process-based hierarchical model for network weights based on unit embeddings that can flexibly encode correlated weight structures, and (ii) input-dependent versions of these weight priors that can provide convenient ways to regularize the function space.
We show these models provide desirable test-time uncertainty estimates on out-of-distribution data, demonstrate cases of modeling inductive biases for neural networks with kernels, and demonstrate competitive predictive performance on an active learning benchmark
arXiv Detail & Related papers (2020-02-10T07:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.