A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime
- URL: http://arxiv.org/abs/2402.15718v1
- Date: Sat, 24 Feb 2024 04:57:59 GMT
- Title: A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime
- Authors: Jihao Long, Xiaojun Peng and Lei Wu
- Abstract summary: We prove that KRR can attain the minimax optimal rate, which depends on both the eigenvalue decay of the associated kernel and the relative smoothness of target functions.
Our proof leverages a novel extension of the duality framework introduced by Chen et al. (2023), which could be useful in analyzing kernel-based methods beyond the scope of this work.
- Score: 5.153104177051464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we conduct a comprehensive analysis of generalization
properties of Kernel Ridge Regression (KRR) in the noiseless regime, a scenario
crucial to scientific computing, where data are often generated via computer
simulations. We prove that KRR can attain the minimax optimal rate, which
depends on both the eigenvalue decay of the associated kernel and the relative
smoothness of target functions. Particularly, when the eigenvalue decays
exponentially fast, KRR achieves the spectral accuracy, i.e., a convergence
rate faster than any polynomial. Moreover, the numerical experiments well
corroborate our theoretical findings. Our proof leverages a novel extension of
the duality framework introduced by Chen et al. (2023), which could be useful
in analyzing kernel-based methods beyond the scope of this work.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Semi-Implicit Functional Gradient Flow for Efficient Sampling [30.32233517392456]
We propose a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family.
We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees.
In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling.
arXiv Detail & Related papers (2024-10-23T15:00:30Z) - A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression [6.749750044497731]
This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions.
We analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel.
arXiv Detail & Related papers (2024-10-23T11:52:52Z) - On the Wasserstein Convergence and Straightness of Rectified Flow [54.580605276017096]
Rectified Flow (RF) is a generative model that aims to learn straight flow trajectories from noise to data.
We provide a theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution.
We present general conditions guaranteeing uniqueness and straightness of 1-RF, which is in line with previous empirical findings.
arXiv Detail & Related papers (2024-10-19T02:36:11Z) - Optimal Kernel Quantile Learning with Random Features [0.9208007322096533]
This paper presents a generalization study of kernel quantile regression with random features (KQR-RF)
Our study establishes the capacity-dependent learning rates for KQR-RF under mild conditions on the number of RFs.
By slightly modifying our assumptions, the capacity-dependent error analysis can also be applied to cases with Lipschitz continuous losses.
arXiv Detail & Related papers (2024-08-24T14:26:09Z) - Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels.
For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - On the Role of Entropy-based Loss for Learning Causal Structures with
Continuous Optimization [27.613220411996025]
A method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss.
We show that the violation of the Gaussian noise assumption will hinder the causal direction identification.
We propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution.
arXiv Detail & Related papers (2021-06-05T08:29:51Z) - Fast Statistical Leverage Score Approximation in Kernel Ridge Regression [12.258887270632869]
Nystr"om approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems.
We propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary- Kernel-based KRR.
arXiv Detail & Related papers (2021-03-09T05:57:08Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.