On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing
- URL: http://arxiv.org/abs/2411.14288v2
- Date: Thu, 23 Jan 2025 22:12:32 GMT
- Title: On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing
- Authors: Arash Behboodi, Gabriele Cesa,
- Abstract summary: Weight sharing, equivariant, and local filters are believed to contribute to the sample efficiency of neural networks.
We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity.
- Score: 12.845681770287005
- License:
- Abstract: Weight sharing, equivariance, and local filters, as in convolutional neural networks, are believed to contribute to the sample efficiency of neural networks. However, it is not clear how each one of these design choices contributes to the generalization error. Through the lens of statistical learning theory, we aim to provide insight into this question by characterizing the relative impact of each choice on the sample complexity. We obtain lower and upper sample complexity bounds for a class of single hidden layer networks. For a large class of activation functions, the bounds depend merely on the norm of filters and are dimension-independent. We also provide bounds for max-pooling and an extension to multi-layer networks, both with mild dimension dependence. We provide a few takeaways from the theoretical results. It can be shown that depending on the weight-sharing mechanism, the non-equivariant weight-sharing can yield a similar generalization bound as the equivariant one. We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity. We conduct extensive experiments and highlight some consistent trends for these models.
Related papers
- The impact of allocation strategies in subset learning on the expressive power of neural networks [0.0]
We investigate how different allocations of a fixed number of learnable weights influence the capacity of neural networks.
We establish conditions under which allocations have maximal or minimal expressive power in linear recurrent neural networks and linear multilayer feedforward networks.
Our results emphasize the critical role of strategically distributing learnable weights across the network, showing that a more widespread allocation generally enhances the network's expressive power.
arXiv Detail & Related papers (2025-02-10T09:43:43Z) - Deterministic equivalent and error universality of deep random features
learning [4.8461049669050915]
This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures.
First, we prove universality of the test error in a universality ridge setting where the learner and target networks share the same intermediate layers, and provide a sharp formula for it.
Second, we conjecture the universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures.
arXiv Detail & Related papers (2023-02-01T12:37:10Z) - Understanding the Covariance Structure of Convolutional Filters [86.0964031294896]
Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure.
We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
arXiv Detail & Related papers (2022-10-07T15:59:13Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deep neural networks with dependent weights: Gaussian Process mixture
limit, heavy tails, sparsity and compressibility [18.531464406721412]
This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent.
Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node.
arXiv Detail & Related papers (2022-05-17T09:14:32Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Understanding the Distributions of Aggregation Layers in Deep Neural
Networks [8.784438985280092]
aggregation functions as an important mechanism for consolidating deep features into a more compact representation.
In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net.
We propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation.
arXiv Detail & Related papers (2021-07-09T14:23:57Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Gaussian Mixture Graphical Lasso with Application to Edge Detection in
Brain Networks [21.49394455839253]
This work is inspired by Latent DirichletAllocation (LDA)
We propose a novel model called GaussianMixture Graphical Lasso (MGL)
MGL learns the proportionsof signals generated by each mixture component and their parameters iteratively via an EM framework.
arXiv Detail & Related papers (2021-01-13T21:15:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.