Understanding the Covariance Structure of Convolutional Filters
- URL: http://arxiv.org/abs/2210.03651v1
- Date: Fri, 7 Oct 2022 15:59:13 GMT
- Title: Understanding the Covariance Structure of Convolutional Filters
- Authors: Asher Trockman, Devin Willmott, J. Zico Kolter
- Abstract summary: Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure.
We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
- Score: 86.0964031294896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network weights are typically initialized at random from univariate
distributions, controlling just the variance of individual weights even in
highly-structured operations like convolutions. Recent ViT-inspired
convolutional networks such as ConvMixer and ConvNeXt use large-kernel
depthwise convolutions whose learned filters have notable structure; this
presents an opportunity to study their empirical covariances. In this work, we
first observe that such learned filters have highly-structured covariance
matrices, and moreover, we find that covariances calculated from small networks
may be used to effectively initialize a variety of larger networks of different
depths, widths, patch sizes, and kernel sizes, indicating a degree of
model-independence to the covariance structure. Motivated by these findings, we
then propose a learning-free multivariate initialization scheme for
convolutional filters using a simple, closed-form construction of their
covariance. Models using our initialization outperform those using traditional
univariate initializations, and typically meet or exceed the performance of
those initialized from the covariances of learned filters; in some cases, this
improvement can be achieved without training the depthwise convolutional
filters at all.
Related papers
- On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing [12.845681770287005]
Weight sharing, equivariance, and local filters, as in convolutional neural networks, are believed to contribute to the sample efficiency of neural networks.
We obtain lower and upper sample complexity bounds for a class of single hidden layer networks.
We show that the bound depends merely on the norm of filters, which is tighter than using the spectral norm of the respective matrix.
arXiv Detail & Related papers (2024-11-21T16:36:01Z) - Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks.
Group-equivariant convolutions are a popular approach to obtain equivariant representations.
We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z) - The Power of Linear Combinations: Learning with Random Convolutions [2.0305676256390934]
Modern CNNs can achieve high test accuracies without ever updating randomly (spatial) convolution filters.
These combinations of random filters can implicitly regularize the resulting operations.
Although we only observe relatively small gains from learning $3times 3$ convolutions, the learning gains increase proportionally with kernel size.
arXiv Detail & Related papers (2023-01-26T19:17:10Z) - SIReN-VAE: Leveraging Flows and Amortized Inference for Bayesian
Networks [2.8597160727750564]
This work explores incorporating arbitrary dependency structures, as specified by Bayesian networks, into VAEs.
This is achieved by extending both the prior and inference network with graphical residual flows.
We compare our model's performance on several synthetic datasets and show its potential in data-sparse settings.
arXiv Detail & Related papers (2022-04-23T10:31:08Z) - Deep Learning for the Benes Filter [91.3755431537592]
We present a new numerical method based on the mesh-free neural network representation of the density of the solution of the Benes model.
We discuss the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network.
arXiv Detail & Related papers (2022-03-09T14:08:38Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Learning Sparse Filters in Deep Convolutional Neural Networks with a
l1/l2 Pseudo-Norm [5.3791844634527495]
Deep neural networks (DNNs) have proven to be efficient for numerous tasks, but come at a high memory and computation cost.
Recent research has shown that their structure can be more compact without compromising their performance.
We present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients.
arXiv Detail & Related papers (2020-07-20T11:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.