Non-Vacuous Generalisation Bounds for Shallow Neural Networks
- URL: http://arxiv.org/abs/2202.01627v2
- Date: Fri, 4 Feb 2022 15:41:51 GMT
- Title: Non-Vacuous Generalisation Bounds for Shallow Neural Networks
- Authors: Felix Biggs, Benjamin Guedj
- Abstract summary: We focus on a specific class of shallow neural networks with a single hidden layer.
We derive new generalisation bounds through the PAC-Bayesian theory.
Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
- Score: 5.799808780731661
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We focus on a specific class of shallow neural networks with a single hidden
layer, namely those with $L_2$-normalised data and either a sigmoid-shaped
Gaussian error function ("erf") activation or a Gaussian Error Linear Unit
(GELU) activation. For these networks, we derive new generalisation bounds
through the PAC-Bayesian theory; unlike most existing such bounds they apply to
neural networks with deterministic rather than randomised parameters. Our
bounds are empirically non-vacuous when the network is trained with vanilla
stochastic gradient descent on MNIST and Fashion-MNIST.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Differentially Private Non-convex Learning for Multi-layer Neural
Networks [35.24835396398768]
This paper focuses on the problem of Differentially Private Tangent Optimization for (multi-layer) fully connected neural networks with a single output node.
By utilizing recent advances in Neural Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large.
arXiv Detail & Related papers (2023-10-12T15:48:14Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets.
We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Norm-based Generalization Bounds for Compositionally Sparse Neural
Networks [11.987589603961622]
We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks.
Taken together, these results suggest that compositional sparsity of the underlying target function is critical to the success of deep neural networks.
arXiv Detail & Related papers (2023-01-28T00:06:22Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Critical Initialization of Wide and Deep Neural Networks through Partial
Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$.
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z) - How Powerful are Shallow Neural Networks with Bandlimited Random
Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks.
A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.