High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch
- URL: http://arxiv.org/abs/2602.11747v1
- Date: Thu, 12 Feb 2026 09:24:08 GMT
- Title: High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch
- Authors: Paul Liautaud, Pierre Gaillard, Olivier Wintenberger,
- Abstract summary: We study nonparametric regression over Besov spaces from noisy observations under sub-exponential noise.<n>We propose a wavelet-based online learning algorithm that dynamically adjusts to the observed gradient noise by adaptively clipping it at an appropriate level.<n>We obtain adaptive and minimax-optimal estimation rates for Besov spaces via a refined online-to-batch conversion.
- Score: 8.691764858459623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study nonparametric regression over Besov spaces from noisy observations under sub-exponential noise, aiming to achieve minimax-optimal guarantees on the integrated squared error that hold with high probability and adapt to the unknown noise level. To this end, we propose a wavelet-based online learning algorithm that dynamically adjusts to the observed gradient noise by adaptively clipping it at an appropriate level, eliminating the need to tune parameters such as the noise variance or gradient bounds. As a by-product of our analysis, we derive high-probability adaptive regret bounds that scale with the $\ell_1$-norm of the competitor. Finally, in the batch statistical setting, we obtain adaptive and minimax-optimal estimation rates for Besov spaces via a refined online-to-batch conversion. This approach carefully exploits the structure of the squared loss in combination with self-normalized concentration inequalities.
Related papers
- Minimax Adaptive Online Nonparametric Regression over Besov Spaces [8.691764858459623]
We study online adversarial regression with convex losses against a rich class of continuous yet highly irregular prediction rules.<n>We introduce an adaptive wavelet-based algorithm that performs sequential prediction without prior knowledge of $(s,p,q)$.<n>We also design a locally adaptive extension capable of dynamically tracking spatially inhomogeneous smoothness.
arXiv Detail & Related papers (2025-05-26T09:23:11Z) - Optimized Gradient Clipping for Noisy Label Learning [26.463965846251938]
We propose a simple yet effective approach called Optimized Gradient Clipping (OGC)<n>OGC dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping.<n>Our experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world noise, demonstrate the effectiveness of OGC.
arXiv Detail & Related papers (2024-12-12T05:08:05Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Benign overfitting and adaptive nonparametric regression [71.70323672531606]
We construct an estimator which is a continuous function interpolating the data points with high probability.
We attain minimax optimal rates under mean squared risk on the scale of H"older classes adaptively to the unknown smoothness.
arXiv Detail & Related papers (2022-06-27T14:50:14Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix
Recovery [8.581512812219737]
We propose a new framework for low-rank matrix recovery from observations corrupted with noise and outliers.
Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level.
We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level.
arXiv Detail & Related papers (2021-06-17T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.