Related papers: Norm-based Generalization Bounds for Compositionally Sparse Neural Networks

Norm-based Generalization Bounds for Compositionally Sparse Neural Networks

URL: http://arxiv.org/abs/2301.12033v1
Date: Sat, 28 Jan 2023 00:06:22 GMT
Title: Norm-based Generalization Bounds for Compositionally Sparse Neural Networks
Authors: Tomer Galanti, Mengjia Xu, Liane Galanti, Tomaso Poggio
Abstract summary: We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks. Taken together, these results suggest that compositional sparsity of the underlying target function is critical to the success of deep neural networks.
Score: 11.987589603961622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks. These bounds differ from previous ones, as they consider the norms of the convolutional filters instead of the norms of the associated Toeplitz matrices, independently of weight sharing between neurons. As we show theoretically, these bounds may be orders of magnitude better than standard norm-based generalization bounds and empirically, they are almost non-vacuous in estimating generalization in various simple classification problems. Taken together, these results suggest that compositional sparsity of the underlying target function is critical to the success of deep neural networks.

Related papers

Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders [2.0072624123275533]
We study the learning problem associated with spiking neural networks. We focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights. In particular, we show in theory and simulations that affine spiking neural networks are capable of approximating shallow ReLU neural networks.
arXiv Detail & Related papers (2024-04-06T08:17:07Z)
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process. Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z)
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z)
Generalization bounds for neural ordinary differential equations and deep residual networks [1.2328446298523066]
We consider a family of parameterized neural ordinary differential equations (neural ODEs) with continuous-in-time parameters. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields a generalization bound for a class of deep residual networks.
arXiv Detail & Related papers (2023-05-11T08:29:34Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
Non-Vacuous Generalisation Bounds for Shallow Neural Networks [5.799808780731661]
We focus on a specific class of shallow neural networks with a single hidden layer. We derive new generalisation bounds through the PAC-Bayesian theory. Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
arXiv Detail & Related papers (2022-02-03T14:59:51Z)
Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks [6.173968909465726]
We introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types.
arXiv Detail & Related papers (2021-12-08T16:17:33Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Compressive Sensing and Neural Networks from a Statistical Learning Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.