Redundant representations help generalization in wide neural networks
- URL: http://arxiv.org/abs/2106.03485v4
- Date: Sat, 29 Apr 2023 09:39:14 GMT
- Title: Redundant representations help generalization in wide neural networks
- Authors: Diego Doimo, Aldo Glielmo, Sebastian Goldt, Alessandro Laio
- Abstract summary: We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
- Score: 71.38860635025907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) defy the classical bias-variance trade-off:
adding parameters to a DNN that interpolates its training data will typically
improve its generalization performance. Explaining the mechanism behind this
``benign overfitting'' in deep networks remains an outstanding challenge. Here,
we study the last hidden layer representations of various state-of-the-art
convolutional neural networks and find that if the last hidden representation
is wide enough, its neurons tend to split into groups that carry identical
information, and differ from each other only by statistically independent
noise. The number of such groups increases linearly with the width of the
layer, but only if the width is above a critical value. We show that redundant
neurons appear only when the training process reaches interpolation and the
training error is zero.
Related papers
- Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias
for Correlated Inputs [5.7166378791349315]
We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network converges to zero loss.
We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm.
arXiv Detail & Related papers (2023-06-10T16:36:22Z) - Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets.
We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - On the optimization and generalization of overparameterized implicit
neural networks [25.237054775800164]
Implicit neural networks have become increasingly attractive in the machine learning community.
We show that global convergence is guaranteed, even if only the implicit layer is trained.
This paper investigates the generalization error for implicit neural networks.
arXiv Detail & Related papers (2022-09-30T16:19:46Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - A Deep Conditioning Treatment of Neural Networks [37.192369308257504]
We show that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data.
We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers via the neural tangent kernel.
arXiv Detail & Related papers (2020-02-04T20:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.