Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
- URL: http://arxiv.org/abs/2303.15740v2
- Date: Tue, 17 Sep 2024 01:23:33 GMT
- Title: Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
- Authors: Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia,
- Abstract summary: We establish maximal concentration bounds for the iterates generated by an approximation (SA) under a contractive operator.
We consider two settings where the iterates are potentially unbounded: SA with bounded multiplicative noise and SA with sub-Gaussian additive noise.
- Score: 9.76321513479048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we establish maximal concentration bounds for the iterates generated by a stochastic approximation (SA) algorithm under a contractive operator with respect to some arbitrary norm (for example, the $\ell_\infty$-norm). We consider two settings where the iterates are potentially unbounded: SA with bounded multiplicative noise and SA with sub-Gaussian additive noise. Our maximal concentration inequalities state that the convergence error has a sub-Gaussian tail in the additive noise setting and a Weibull tail (which is faster than polynomial decay but could be slower than exponential decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is generally impossible to have sub-exponential tails under multiplicative noise. To establish the maximal concentration bounds, we develop a novel bootstrapping argument that involves bounding the moment-generating function of a modified version of the generalized Moreau envelope of the convergence error and constructing an exponential supermartingale to enable using Ville's maximal inequality. We demonstrate the applicability of our theoretical results in the context of linear SA and reinforcement learning.
Related papers
- Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem [7.443139252028032]
We consider average-reward Reinforcement Learning with unbounded state space and reward function.
Recent works studied this problem in the actor-critic framework.
We study Temporal Difference (TD) learning with linear function approximation.
arXiv Detail & Related papers (2024-10-29T03:40:53Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems.
This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Closing the Gap Between the Upper Bound and the Lower Bound of Adam's
Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption.
Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - A lower confidence sequence for the changing mean of non-negative right
heavy-tailed observations with bounded mean [9.289846887298854]
A confidence sequence produces an adapted sequence of sets for a predictable parameter sequence with a time-parametric coverage guarantee.
This work constructs a non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero.
arXiv Detail & Related papers (2022-10-20T09:50:05Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.