Related papers: On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing

On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing

URL: http://arxiv.org/abs/2411.14288v2
Date: Thu, 23 Jan 2025 22:12:32 GMT
Title: On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing
Authors: Arash Behboodi, Gabriele Cesa,
Abstract summary: Weight sharing, equivariant, and local filters are believed to contribute to the sample efficiency of neural networks.<n>We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity.
Score: 12.845681770287005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weight sharing, equivariance, and local filters, as in convolutional neural networks, are believed to contribute to the sample efficiency of neural networks. However, it is not clear how each one of these design choices contributes to the generalization error. Through the lens of statistical learning theory, we aim to provide insight into this question by characterizing the relative impact of each choice on the sample complexity. We obtain lower and upper sample complexity bounds for a class of single hidden layer networks. For a large class of activation functions, the bounds depend merely on the norm of filters and are dimension-independent. We also provide bounds for max-pooling and an extension to multi-layer networks, both with mild dimension dependence. We provide a few takeaways from the theoretical results. It can be shown that depending on the weight-sharing mechanism, the non-equivariant weight-sharing can yield a similar generalization bound as the equivariant one. We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity. We conduct extensive experiments and highlight some consistent trends for these models.

Related papers

On the Importance of Gaussianizing Representations [3.6919724596215615]
We present a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations.
arXiv Detail & Related papers (2025-05-01T17:47:44Z)
The impact of allocation strategies in subset learning on the expressive power of neural networks [0.0]
We investigate how different allocations of a fixed number of learnable weights influence the capacity of neural networks. We establish conditions under which allocations have maximal or minimal expressive power in linear recurrent neural networks and linear multilayer feedforward networks. Our results emphasize the critical role of strategically distributing learnable weights across the network, showing that a more widespread allocation generally enhances the network's expressive power.
arXiv Detail & Related papers (2025-02-10T09:43:43Z)
Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed. We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z)
Deterministic equivalent and error universality of deep random features learning [4.8461049669050915]
This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove universality of the test error in a universality ridge setting where the learner and target networks share the same intermediate layers, and provide a sharp formula for it. Second, we conjecture the universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures.
arXiv Detail & Related papers (2023-02-01T12:37:10Z)
Understanding the Covariance Structure of Convolutional Filters [86.0964031294896]
Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure. We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
arXiv Detail & Related papers (2022-10-07T15:59:13Z)
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks [0.0]
Subsampled convolutions with Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. We highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform.
arXiv Detail & Related papers (2022-09-19T08:15:30Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility [18.531464406721412]
This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node.
arXiv Detail & Related papers (2022-05-17T09:14:32Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Deep Learning for the Benes Filter [91.3755431537592]
We present a new numerical method based on the mesh-free neural network representation of the density of the solution of the Benes model. We discuss the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network.
arXiv Detail & Related papers (2022-03-09T14:08:38Z)
The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees. We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z)
Implicit Equivariance in Convolutional Networks [1.911678487931003]
Implicitly Equivariant Networks (IEN) induce equivariant in the different layers of a standard CNN model. We show IEN outperforms the state-of-the-art rotation equivariant tracking method while providing faster inference speed.
arXiv Detail & Related papers (2021-11-28T14:44:17Z)
Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems. We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Gaussian Mixture Graphical Lasso with Application to Edge Detection in Brain Networks [21.49394455839253]
This work is inspired by Latent DirichletAllocation (LDA) We propose a novel model called GaussianMixture Graphical Lasso (MGL) MGL learns the proportionsof signals generated by each mixture component and their parameters iteratively via an EM framework.
arXiv Detail & Related papers (2021-01-13T21:15:30Z)
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations. We then densely reconstruct the feature map with an efficient procedure. The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
Revisiting Spatial Invariance with Low-Rank Local Connectivity [33.430515807834254]
We show that relaxing spatial invariance improves classification accuracy over both convolution and locally connected layers. In experiments with small convolutional networks, we find that relaxing spatial invariance improves classification accuracy over both convolution and locally connected layers.
arXiv Detail & Related papers (2020-02-07T18:56:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.