How rotational invariance of common kernels prevents generalization in
high dimensions
- URL: http://arxiv.org/abs/2104.04244v1
- Date: Fri, 9 Apr 2021 08:27:37 GMT
- Title: How rotational invariance of common kernels prevents generalization in
high dimensions
- Authors: Konstantin Donhauser, Mingqi Wu and Fanny Yang
- Abstract summary: Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings.
Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
- Score: 8.508198765617196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel ridge regression is well-known to achieve minimax optimal rates in
low-dimensional settings. However, its behavior in high dimensions is much less
understood. Recent work establishes consistency for kernel regression under
certain assumptions on the ground truth function and the distribution of the
input data. In this paper, we show that the rotational invariance property of
commonly studied kernels (such as RBF, inner product kernels and
fully-connected NTK of any depth) induces a bias towards low-degree polynomials
in high dimensions. Our result implies a lower bound on the generalization
error for a wide range of distributions and various choices of the scaling for
kernels with different eigenvalue decays. This lower bound suggests that
general consistency results for kernel ridge regression in high dimensions
require a more refined analysis that depends on the structure of the kernel
beyond its eigenvalue decay.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay [13.803850290216257]
We rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method.
Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks.
arXiv Detail & Related papers (2024-01-03T08:00:50Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Kernel Mean Estimation by Marginalized Corrupted Distributions [96.9272743070371]
Estimating the kernel mean in a kernel Hilbert space is a critical component in many kernel learning algorithms.
We present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution.
arXiv Detail & Related papers (2021-07-10T15:11:28Z) - Kernel approximation on algebraic varieties [0.7614628596146599]
We show that significantly better approximations are obtainable in problems involving sparse or low-rank data.
Our results are presented for smooth isotropic kernels, the predominant class of kernels used in applications.
arXiv Detail & Related papers (2021-06-04T23:42:19Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Optimal Rates of Distributed Regression with Imperfect Kernels [0.0]
We study the distributed kernel regression via the divide conquer and conquer approach.
We show that the kernel ridge regression can achieve rates faster than $N-1$ in the noise free setting.
arXiv Detail & Related papers (2020-06-30T13:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.