On the Optimality of Misspecified Kernel Ridge Regression
- URL: http://arxiv.org/abs/2305.07241v1
- Date: Fri, 12 May 2023 04:12:12 GMT
- Title: On the Optimality of Misspecified Kernel Ridge Regression
- Authors: Haobo Zhang, Yicheng Li, Weihao Lu, Qian Lin
- Abstract summary: We show that KRR is minimax optimal for any $sin (0,1)$ when the $mathcalH$ is a Sobolev RKHS.
- Score: 13.995944403996566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the misspecified kernel ridge regression problem, researchers usually
assume the underground true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, a
less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS)
$\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results
require $\|f_{\rho}^{*}\|_{L^{\infty}}<\infty$ which implicitly requires $s >
\alpha_{0}$ where $\alpha_{0}\in (0,1)$ is the embedding index, a constant
depending on $\mathcal{H}$. Whether the KRR is optimal for all $s\in (0,1)$ is
an outstanding problem lasting for years. In this paper, we show that KRR is
minimax optimal for any $s\in (0,1)$ when the $\mathcal{H}$ is a Sobolev RKHS.
Related papers
- Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Optimal Rates of Kernel Ridge Regression under Source Condition in Large
Dimensions [15.988264513040903]
We study the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n asymp dgamma$ for some $gamma > 0$.
Our results illustrate that the curves of rate varying along $gamma$ exhibit the periodic plateau behavior and the multiple descent behavior.
arXiv Detail & Related papers (2024-01-02T16:14:35Z) - On the Optimality of Misspecified Spectral Algorithms [15.398375151050768]
In the misspecified spectral algorithms problem, researchers usually assume the underground true function $f_rho* in [mathcalH]s$.
We show that spectral algorithms are minimax optimal for any $alpha_0-frac1beta s 1$, where $beta$ is the eigenvalue decay rate of $mathcalH$.
arXiv Detail & Related papers (2023-03-27T06:50:31Z) - On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Minimax Optimal Regression over Sobolev Spaces via Laplacian
Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression.
We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Generalization error of random features and kernel methods:
hypercontractivity and kernel matrix concentration [19.78800773518545]
We study the use of random features methods in conjunction with ridge regression in the feature space $mathbb RN$.
This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime.
arXiv Detail & Related papers (2021-01-26T06:46:41Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Some convergent results for Backtracking Gradient Descent method on
Banach spaces [0.0]
bf Theorem. Let $X$ be a Banach space and $f:Xrightarrow mathbbR$ be a $C2$ function.
Let $mathcalC$ be the set of critical points of $f$.
arXiv Detail & Related papers (2020-01-16T12:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.