Generalization Error Rates in Kernel Regression: The Crossover from the
Noiseless to Noisy Regime
- URL: http://arxiv.org/abs/2105.15004v1
- Date: Mon, 31 May 2021 14:39:08 GMT
- Title: Generalization Error Rates in Kernel Regression: The Crossover from the
Noiseless to Noisy Regime
- Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborov\'a
- Abstract summary: We consider Kernel Ridge Regression (KRR) under the Gaussian design.
We show the existence of a transition in the noisy setting between the noiseless exponents to its noisy values as the sample complexity is increased.
- Score: 29.731516232010343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this manuscript we consider Kernel Ridge Regression (KRR) under the
Gaussian design. Exponents for the decay of the excess generalization error of
KRR have been reported in various works under the assumption of power-law decay
of eigenvalues of the features co-variance. These decays were, however,
provided for sizeably different setups, namely in the noiseless case with
constant regularization and in the noisy optimally regularized case.
Intermediary settings have been left substantially uncharted. In this work, we
unify and extend this line of work, providing characterization of all regimes
and excess error decay rates that can be observed in terms of the interplay of
noise and regularization. In particular, we show the existence of a transition
in the noisy setting between the noiseless exponents to its noisy values as the
sample complexity is increased. Finally, we illustrate how this crossover can
also be observed on real data sets.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems.
This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z) - Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum [6.749750044497731]
We prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption.
We also identify that the independence of the features plays an important role in guaranteeing tempered overfitting.
arXiv Detail & Related papers (2024-02-02T10:36:53Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - The noise level in linear regression with dependent data [36.252641692809924]
We derive upper bounds for random design linear regression with dependent ($beta$-mixing) data.
Up to constant factors, our analysis correctly recovers the variance term predicted by the Central Limit Theorem.
arXiv Detail & Related papers (2023-05-18T17:55:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Global Convergence and Stability of Stochastic Gradient Descent [0.0]
We show that SGD converges to a stationary point under nearly arbitrary non-ity and noise models.
We show that SGD can be applied to a range of problems with global convergence of confidence.
arXiv Detail & Related papers (2021-10-04T19:00:50Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Deconfounded Score Method: Scoring DAGs with Dense Unobserved
Confounding [101.35070661471124]
We show that unobserved confounding leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects.
We propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems.
arXiv Detail & Related papers (2021-03-28T11:07:59Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.