On the Asymptotic Learning Curves of Kernel Ridge Regression under
Power-law Decay
- URL: http://arxiv.org/abs/2309.13337v1
- Date: Sat, 23 Sep 2023 11:18:13 GMT
- Title: On the Asymptotic Learning Curves of Kernel Ridge Regression under
Power-law Decay
- Authors: Yicheng Li, Haobo Zhang, Qian Lin
- Abstract summary: We show that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small.
Our results suggest that the phenomenon exists in very wide neural networks only when the noise level is small.
- Score: 17.306230523610864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The widely observed 'benign overfitting phenomenon' in the neural network
literature raises the challenge to the 'bias-variance trade-off' doctrine in
the statistical learning theory. Since the generalization ability of the 'lazy
trained' over-parametrized neural network can be well approximated by that of
the neural tangent kernel regression, the curve of the excess risk (namely, the
learning curve) of kernel ridge regression attracts increasing attention
recently. However, most recent arguments on the learning curve are heuristic
and are based on the 'Gaussian design' assumption. In this paper, under mild
and more realistic assumptions, we rigorously provide a full characterization
of the learning curve: elaborating the effect and the interplay of the choice
of the regularization parameter, the source condition and the noise. In
particular, our results suggest that the 'benign overfitting phenomenon' exists
in very wide neural networks only when the noise level is small.
Related papers
- Feedback Favors the Generalization of Neural ODEs [24.342023073252395]
We present feedback neural networks, showing that a feedback loop can flexibly correct the learned latent dynamics of neural ordinary differential equations (neural ODEs)
The feedback neural network is a novel two-DOF neural network, which possesses robust performance in unseen scenarios with no loss of accuracy performance on previous tasks.
arXiv Detail & Related papers (2024-10-14T08:09:45Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting [19.08269066145619]
Some interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance.
We argue that real interpolating methods like neural networks do not fit benignly.
arXiv Detail & Related papers (2022-07-14T00:23:01Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.