Fast Robust Kernel Regression through Sign Gradient Descent with Early Stopping
- URL: http://arxiv.org/abs/2306.16838v6
- Date: Mon, 2 Sep 2024 09:54:53 GMT
- Title: Fast Robust Kernel Regression through Sign Gradient Descent with Early Stopping
- Authors: Oskar Allerbo,
- Abstract summary: Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the model parameters.
We introduce an equivalent formulation of the objective function of KRR, which opens up both for replacing the ridge penalty with the $ell_infty$ and $ell_1$ penalties.
- Score: 1.5229257192293204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the model parameters. Here, we introduce an equivalent formulation of the objective function of KRR, which opens up both for replacing the ridge penalty with the $\ell_\infty$ and $\ell_1$ penalties and for studying kernel ridge regression from the perspective of gradient descent. Using the $\ell_\infty$ and $\ell_1$ penalties, we obtain robust and sparse kernel regression, respectively. We further study the similarities between explicitly regularized kernel regression and the solutions obtained by early stopping of iterative gradient-based methods, where we connect $\ell_\infty$ regularization to sign gradient descent, $\ell_1$ regularization to forward stagewise regression (also known as coordinate descent), and $\ell_2$ regularization to gradient descent, and, in the last case, theoretically bound for the differences. We exploit the close relations between $\ell_\infty$ regularization and sign gradient descent, and between $\ell_1$ regularization and coordinate descent to propose computationally efficient methods for robust and sparse kernel regression. We finally compare robust kernel regression through sign gradient descent to existing methods for robust kernel regression on five real data sets, demonstrating that our method is one to two orders of magnitude faster, without compromising accuracy.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Stochastic gradient descent for streaming linear and rectified linear
systems with Massart noise [9.841406613646813]
We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50%$ Massart corruption rate.
This is the first convergence guarantee result for robust ReLU regression in the streaming setting.
arXiv Detail & Related papers (2024-03-02T12:45:01Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Gradient Descent Converges Linearly for Logistic Regression on Separable
Data [17.60502131429094]
We show that running gradient descent with variable learning rate guarantees loss $f(x) leq 1.1 cdot f(x*) + epsilon$ the logistic regression objective.
We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
arXiv Detail & Related papers (2023-06-26T02:15:26Z) - Near Optimal Private and Robust Linear Regression [47.2888113094367]
We propose a variant of the popular differentially private gradient descent (DP-SGD) algorithm with two innovations.
Under label-corruption, this is the first efficient linear regression algorithm to guarantee both $(varepsilon,delta)$-DP and robustness.
arXiv Detail & Related papers (2023-01-30T20:33:26Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - A Bregman Method for Structure Learning on Sparse Directed Acyclic
Graphs [84.7328507118758]
We develop a Bregman proximal gradient method for structure learning.
We measure the impact of curvature against a highly nonlinear iteration.
We test our method on various synthetic and real sets.
arXiv Detail & Related papers (2020-11-05T11:37:44Z) - Early stopping and polynomial smoothing in regression with reproducing kernels [2.0411082897313984]
We study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS)
We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle.
The proposed rule is proved to be minimax-optimal over different types of kernel spaces.
arXiv Detail & Related papers (2020-07-14T05:27:18Z) - Optimal Rates of Distributed Regression with Imperfect Kernels [0.0]
We study the distributed kernel regression via the divide conquer and conquer approach.
We show that the kernel ridge regression can achieve rates faster than $N-1$ in the noise free setting.
arXiv Detail & Related papers (2020-06-30T13:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.