Related papers: Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

URL: http://arxiv.org/abs/2303.15740v2
Date: Tue, 17 Sep 2024 01:23:33 GMT
Title: Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
Authors: Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia,
Abstract summary: We establish maximal concentration bounds for the iterates generated by an approximation (SA) under a contractive operator. We consider two settings where the iterates are potentially unbounded: SA with bounded multiplicative noise and SA with sub-Gaussian additive noise.
Score: 9.76321513479048
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we establish maximal concentration bounds for the iterates generated by a stochastic approximation (SA) algorithm under a contractive operator with respect to some arbitrary norm (for example, the $\ell_\infty$-norm). We consider two settings where the iterates are potentially unbounded: SA with bounded multiplicative noise and SA with sub-Gaussian additive noise. Our maximal concentration inequalities state that the convergence error has a sub-Gaussian tail in the additive noise setting and a Weibull tail (which is faster than polynomial decay but could be slower than exponential decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is generally impossible to have sub-exponential tails under multiplicative noise. To establish the maximal concentration bounds, we develop a novel bootstrapping argument that involves bounding the moment-generating function of a modified version of the generalized Moreau envelope of the convergence error and constructing an exponential supermartingale to enable using Ville's maximal inequality. We demonstrate the applicability of our theoretical results in the context of linear SA and reinforcement learning.

Related papers

Towards Sharp Minimax Risk Bounds for Operator Learning [8.781886844319231]
We prove information-theoretic lower bounds together with matching or near-matching upper bounds for uniformly bounded Lipschitz operators.<n>A key implication is a curse of sample complexity which shows that the minimax risk for generic Lipschitz operators cannot decay at any algebraic rate in the sample size.
arXiv Detail & Related papers (2025-12-19T17:07:43Z)
Regularized least squares learning with heavy-tailed noise is minimax optimal [23.16162334363017]
This paper examines the performance of ridge regression in kernel Hilbert spaces in the presence of noise that exhibits a finite number of reproducing higher moments.<n>We establish risk bounds consisting of subgaussian and excess terms based on the well known integral operator framework.
arXiv Detail & Related papers (2025-05-20T11:17:54Z)
Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle [77.3806516979843]
We show that SignSGD achieves optimal sample complexity $tildeO(varepsilon-frac3kappa - 2kappa 1right)$ with high accuracy. We also explore the application of the sign operator in zeroth-order optimization with an oracle that can only compare function values at two different points.
arXiv Detail & Related papers (2025-02-11T19:54:11Z)
Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem [7.443139252028032]
We consider average-reward Reinforcement Learning with unbounded state space and reward function. Recent works studied this problem in the actor-critic framework. We study Temporal Difference (TD) learning with linear function approximation.
arXiv Detail & Related papers (2024-10-29T03:40:53Z)
Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV) We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z)
Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems. This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z)
Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms. We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
A lower confidence sequence for the changing mean of non-negative right heavy-tailed observations with bounded mean [9.289846887298854]
A confidence sequence produces an adapted sequence of sets for a predictable parameter sequence with a time-parametric coverage guarantee. This work constructs a non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero.
arXiv Detail & Related papers (2022-10-20T09:50:05Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics. On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.