The Sample Complexity of One-Hidden-Layer Neural Networks
- URL: http://arxiv.org/abs/2202.06233v1
- Date: Sun, 13 Feb 2022 07:12:02 GMT
- Title: The Sample Complexity of One-Hidden-Layer Neural Networks
- Authors: Gal Vardi, Ohad Shamir and Nathan Srebro
- Abstract summary: We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
- Score: 57.6421258363243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study norm-based uniform convergence bounds for neural networks, aiming at
a tight understanding of how these are affected by the architecture and type of
norm constraint, for the simple class of scalar-valued one-hidden-layer
networks, and inputs bounded in Euclidean norm. We begin by proving that in
general, controlling the spectral norm of the hidden layer weight matrix is
insufficient to get uniform convergence guarantees (independent of the network
width), while a stronger Frobenius norm control is sufficient, extending and
improving on previous work. Motivated by the proof constructions, we identify
and analyze two important settings where a mere spectral norm control turns out
to be sufficient: First, when the network's activation functions are
sufficiently smooth (with the result extending to deeper networks); and second,
for certain types of convolutional networks. In the latter setting, we study
how the sample complexity is additionally affected by parameters such as the
amount of overlap between patches and the overall number of patches.
Related papers
- Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere.
We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Convexifying Sparse Interpolation with Infinitely Wide Neural Networks:
An Atomic Norm Approach [4.380224449592902]
This work examines the problem of exact data via sparse (neuron count), infinitely wide, single hidden layer neural networks with leaky rectified linear unit activations.
We derive simple characterizations of the convex hulls of the corresponding atomic sets for this problem under several different constraints on the weights and biases of the network.
A modest extension of our proposed framework to a binary classification problem is also presented.
arXiv Detail & Related papers (2020-07-15T21:40:51Z) - Universal Approximation Power of Deep Residual Neural Networks via
Nonlinear Control Theory [9.210074587720172]
We explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control.
Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation.
arXiv Detail & Related papers (2020-07-12T14:53:30Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.