A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge
Regression
- URL: http://arxiv.org/abs/2310.00987v2
- Date: Tue, 3 Oct 2023 16:00:34 GMT
- Title: A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge
Regression
- Authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmani\'c, Anastasis Kratsios
and David Belius
- Abstract summary: finite-rank kernels naturally appear in several machine learning problems, e.g. when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task.
We address this gap by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR.
Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.
- Score: 23.156642467474995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing statistical learning guarantees for general kernel regressors often
yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels
naturally appear in several machine learning problems, e.g.\ when fine-tuning a
pre-trained deep neural network's last layer to adapt it to a novel task when
performing transfer learning. We address this gap for finite-rank kernel ridge
regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for
the KRR test error of any finite-rank KRR. Our bounds are tighter than
previously derived bounds on finite-rank KRR, and unlike comparable results,
they also remain valid for any regularization parameters.
Related papers
- Generalization and Risk Bounds for Recurrent Neural Networks [3.0638061480679912]
We establish a new generalization error bound for vanilla RNNs.
We provide a unified framework to calculate the Rademacher complexity that can be applied to a variety of loss functions.
arXiv Detail & Related papers (2024-11-05T03:49:06Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Approximation Bounds for Recurrent Neural Networks with Application to Regression [7.723218675113336]
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs.
We derive upper bounds on the approximation error of RNNs for H"older smooth functions.
Our results provide statistical guarantees on the performance of RNNs.
arXiv Detail & Related papers (2024-09-09T13:02:50Z) - Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum [6.749750044497731]
We prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption.
We also identify that the independence of the features plays an important role in guaranteeing tempered overfitting.
arXiv Detail & Related papers (2024-02-02T10:36:53Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Overparameterized random feature regression with nearly orthogonal data [21.97381518762387]
We study the non-asymptotic behaviors of the random feature ridge regression (RFRR) given by a two-layer neural network.
Our results hold for a wide variety of activation functions and input data sets that exhibit nearly deterministic properties.
arXiv Detail & Related papers (2022-11-11T09:16:25Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Early stopping and polynomial smoothing in regression with reproducing kernels [2.0411082897313984]
We study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS)
We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle.
The proposed rule is proved to be minimax-optimal over different types of kernel spaces.
arXiv Detail & Related papers (2020-07-14T05:27:18Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.