A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime
- URL: http://arxiv.org/abs/2402.15718v1
- Date: Sat, 24 Feb 2024 04:57:59 GMT
- Title: A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime
- Authors: Jihao Long, Xiaojun Peng and Lei Wu
- Abstract summary: We prove that KRR can attain the minimax optimal rate, which depends on both the eigenvalue decay of the associated kernel and the relative smoothness of target functions.
Our proof leverages a novel extension of the duality framework introduced by Chen et al. (2023), which could be useful in analyzing kernel-based methods beyond the scope of this work.
- Score: 5.153104177051464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we conduct a comprehensive analysis of generalization
properties of Kernel Ridge Regression (KRR) in the noiseless regime, a scenario
crucial to scientific computing, where data are often generated via computer
simulations. We prove that KRR can attain the minimax optimal rate, which
depends on both the eigenvalue decay of the associated kernel and the relative
smoothness of target functions. Particularly, when the eigenvalue decays
exponentially fast, KRR achieves the spectral accuracy, i.e., a convergence
rate faster than any polynomial. Moreover, the numerical experiments well
corroborate our theoretical findings. Our proof leverages a novel extension of
the duality framework introduced by Chen et al. (2023), which could be useful
in analyzing kernel-based methods beyond the scope of this work.
Related papers
- A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression [6.749750044497731]
This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions.
We analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel.
arXiv Detail & Related papers (2024-10-23T11:52:52Z) - Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels.
For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Fast Statistical Leverage Score Approximation in Kernel Ridge Regression [12.258887270632869]
Nystr"om approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems.
We propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary- Kernel-based KRR.
arXiv Detail & Related papers (2021-03-09T05:57:08Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.