Related papers: Early stopping and polynomial smoothing in regression with reproducing kernels

Early stopping and polynomial smoothing in regression with reproducing kernels

URL: http://arxiv.org/abs/2007.06827v3
Date: Fri, 22 Nov 2024 23:53:38 GMT
Title: Early stopping and polynomial smoothing in regression with reproducing kernels
Authors: Yaroslav Averyanov, Alain Celisse,
Abstract summary: We study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. The proposed rule is proved to be minimax-optimal over different types of kernel spaces.
Score: 2.0411082897313984
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with the gradient descent and (iterative) kernel ridge regression algorithms. We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. This method enjoys only one assumption on the regression function: it belongs to a reproducing kernel Hilbert space (RKHS). The proposed rule is proved to be minimax-optimal over different types of kernel spaces, including finite-rank and Sobolev smoothness classes. The proof is derived from the fixed-point analysis of the localized Rademacher complexities, which is a standard technique for obtaining optimal rates in the nonparametric regression literature. In addition to that, we present simulation results on artificial datasets that show the comparable performance of the designed rule with respect to other stopping rules such as the one determined by V-fold cross-validation.

Related papers

Learning Curves of Stochastic Gradient Descent in Kernel Regression [7.063108005500741]
We analyze the single-pass Gradient Descent (SGD) in kernel regression under source condition.<n>Surprisingly, we show that SGD achieves min-max optimal rates up to constants among all the scales.<n>The main reason for SGD to overcome the curse of saturation is the exponentially decaying step size schedule.
arXiv Detail & Related papers (2025-05-28T07:16:11Z)
Gradient-Based Non-Linear Inverse Learning [2.6149030745627644]
We study statistical inverse learning in the context of nonlinear inverse problems under random design. We employ gradient descent (GD) and descent gradient (SGD) with mini-batching, both using constant step sizes. Our analysis derives convergence rates for both algorithms under classical a priori assumptions on the smoothness of the target function.
arXiv Detail & Related papers (2024-12-21T22:38:17Z)
Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z)
A Structure-Preserving Kernel Method for Learning Hamiltonian Systems [3.594638299627404]
A structure-preserving kernel ridge regression method is presented that allows the recovery of potentially high-dimensional and nonlinear Hamiltonian functions. The paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters.
arXiv Detail & Related papers (2024-03-15T07:20:21Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Coefficient-based Regularized Distribution Regression [4.21768682940933]
We consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a kernel reproducing Hilbert space (RKHS) Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied. We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate.
arXiv Detail & Related papers (2022-08-26T03:46:14Z)
Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data [0.76146285961466]
We first use the kernel ridge regression to develop imputation for handling item nonresponse. A nonparametric propensity score estimator using the kernel Hilbert space is also developed. The proposed method is applied to analyze the air pollution data measured in Beijing, China.
arXiv Detail & Related papers (2021-07-15T14:51:39Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples. The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.