Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
- URL: http://arxiv.org/abs/2409.03891v1
- Date: Thu, 5 Sep 2024 19:58:58 GMT
- Title: Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
- Authors: Marko Medvedev, Gal Vardi, Nathan Srebro,
- Abstract summary: We consider the overfitting behavior of minimum norm interpolating solutions of kernel ridgeless regression.
For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor.
- Score: 33.900009202637285
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.
Related papers
- Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.
Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - Wiener Chaos in Kernel Regression: Towards Untangling Aleatoric and Epistemic Uncertainty [0.0]
We generalize the setting and consider kernel ridge regression with additive i.i.d. nonGaussian measurement noise.
We show that our approach allows us to distinguish the uncertainty that stems from the noise in the data samples from the total uncertainty encoded in the GP posterior distribution.
arXiv Detail & Related papers (2023-12-12T16:02:35Z) - On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions [16.704246627541103]
We show that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions.
Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy.
arXiv Detail & Related papers (2022-05-26T17:43:20Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - How rotational invariance of common kernels prevents generalization in
high dimensions [8.508198765617196]
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings.
Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
arXiv Detail & Related papers (2021-04-09T08:27:37Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - Consistent Online Gaussian Process Regression Without the Sample
Complexity Bottleneck [14.309243378538012]
We propose an online compression scheme that fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior.
For constant error radius, POG converges to a neighborhood of the population posterior (Theorem 1(ii))but with finite memory at-worst determined by the metric entropy of the feature space.
arXiv Detail & Related papers (2020-04-23T11:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.