Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
- URL: http://arxiv.org/abs/2402.01297v3
- Date: Wed, 29 May 2024 19:23:41 GMT
- Title: Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
- Authors: Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius,
- Abstract summary: We prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption.
We also identify that the independence of the features plays an important role in guaranteeing tempered overfitting.
- Score: 6.749750044497731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.
Related papers
- A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression [6.749750044497731]
This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions.
We analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel.
arXiv Detail & Related papers (2024-10-23T11:52:52Z) - A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge
Regression [23.156642467474995]
finite-rank kernels naturally appear in several machine learning problems, e.g. when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task.
We address this gap by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR.
Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.
arXiv Detail & Related papers (2023-10-02T08:52:29Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Overparameterized random feature regression with nearly orthogonal data [21.97381518762387]
We study the non-asymptotic behaviors of the random feature ridge regression (RFRR) given by a two-layer neural network.
Our results hold for a wide variety of activation functions and input data sets that exhibit nearly deterministic properties.
arXiv Detail & Related papers (2022-11-11T09:16:25Z) - Robust and Provable Guarantees for Sparse Random Embeddings [72.24615341588846]
We improve upon the guarantees for sparse random embeddings provided by Freksen at al. (NIPS'18) and Jagadeesan (NIPS'18)
We show that (a) our bounds are explicit as opposed to the guarantees provided previously, and (b) our bounds are guaranteed to be sharper by practically significant constants.
We empirically demonstrate that our bounds significantly outperform prior works on a wide range of real-world datasets.
arXiv Detail & Related papers (2022-02-22T11:15:59Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Kernel Methods for Causal Functions: Dose, Heterogeneous, and
Incremental Response Curves [26.880628841819004]
We prove uniform consistency with improved finite sample rates via original analysis of generalized kernel ridge regression.
We extend our main results to counterfactual distributions and to causal functions identified by front and back door criteria.
arXiv Detail & Related papers (2020-10-10T00:53:11Z) - Non-asymptotic Optimal Prediction Error for Growing-dimensional
Partially Functional Linear Models [0.951828574518325]
We show the rate-optimal upper and lower bounds of the prediction error.
An exact upper bound for the excess prediction risk is shown in a non-asymptotic form.
We derive the non-asymptotic minimax lower bound under the regularity assumption of the Kullback-Leibler divergence of the models.
arXiv Detail & Related papers (2020-09-10T08:49:32Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.