Regularized least squares learning with heavy-tailed noise is minimax optimal
- URL: http://arxiv.org/abs/2505.14214v2
- Date: Thu, 22 May 2025 14:25:20 GMT
- Title: Regularized least squares learning with heavy-tailed noise is minimax optimal
- Authors: Mattes Mollenhauer, Nicole Mücke, Dimitri Meunier, Arthur Gretton,
- Abstract summary: This paper examines the performance of ridge regression in kernel Hilbert spaces in the presence of noise that exhibits a finite number of reproducing higher moments.<n>We establish risk bounds consisting of subgaussian and excess terms based on the well known integral operator framework.
- Score: 23.16162334363017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables.
Related papers
- Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems.
This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Large deviations rates for stochastic gradient descent with strongly
convex functions [11.247580943940916]
We provide a formal framework for the study of general high probability bounds with gradient descent.
We find an upper large deviations bound for SGD with strongly convex functions.
arXiv Detail & Related papers (2022-11-02T09:15:26Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Optimal Learning Rates for Regularized Least-Squares with a Fourier
Capacity Condition [14.910167993978487]
We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales.
We demonstrate that the spectrum of the Mercer operator can be inferred in the presence of tight'' embeddings of suitable Hilbert scales.
arXiv Detail & Related papers (2022-04-16T18:32:33Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Polynomial convergence of iterations of certain random operators in
Hilbert space [2.732936573198251]
We study the convergence of random iterative sequence of a family of operators on infinite dimensional spaces inspired bySGD algorithm.
We demonstrate that its convergence rate depends on the initial state, while randomness plays a role only in the choice of the best constant factor.
arXiv Detail & Related papers (2022-02-04T17:48:29Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.