Concentration of Contractive Stochastic Approximation: Additive and
Multiplicative Noise
- URL: http://arxiv.org/abs/2303.15740v1
- Date: Tue, 28 Mar 2023 05:32:30 GMT
- Title: Concentration of Contractive Stochastic Approximation: Additive and
Multiplicative Noise
- Authors: Zaiwei Chen, Siva Theja Maguluri, and Martin Zubeldia
- Abstract summary: We study the concentration behavior of a Moreau approximation algorithm under a contractive operator with respect to an arbitrary norm.
We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting.
We provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise.
- Score: 7.532013242448151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study the concentration behavior of a stochastic
approximation (SA) algorithm under a contractive operator with respect to an
arbitrary norm. We consider two settings where the iterates are potentially
unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian
noise. We obtain maximal concentration inequalities on the convergence errors,
and show that these errors have sub-Gaussian tails in the additive noise
setting, and super-polynomial tails (faster than polynomial decay) in the
multiplicative noise setting. In addition, we provide an impossibility result
showing that it is in general not possible to achieve sub-exponential tails for
SA with multiplicative noise. To establish these results, we develop a novel
bootstrapping argument that involves bounding the moment generating function of
the generalized Moreau envelope of the error and the construction of an
exponential supermartingale to enable using Ville's maximal inequality.
To demonstrate the applicability of our theoretical results, we use them to
provide maximal concentration bounds for a large class of reinforcement
learning algorithms, including but not limited to on-policy TD-learning with
linear function approximation, off-policy TD-learning with generalized
importance sampling factors, and $Q$-learning. To the best of our knowledge,
super-polynomial concentration bounds for off-policy TD-learning have not been
established in the literature due to the challenge of handling the combination
of unbounded iterates and multiplicative noise.
Related papers
- Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem [7.443139252028032]
We consider average-reward Reinforcement Learning with unbounded state space and reward function.
Recent works studied this problem in the actor-critic framework.
We study Temporal Difference (TD) learning with linear function approximation.
arXiv Detail & Related papers (2024-10-29T03:40:53Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems.
This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Closing the Gap Between the Upper Bound and the Lower Bound of Adam's
Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption.
Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - A lower confidence sequence for the changing mean of non-negative right
heavy-tailed observations with bounded mean [9.289846887298854]
A confidence sequence produces an adapted sequence of sets for a predictable parameter sequence with a time-parametric coverage guarantee.
This work constructs a non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero.
arXiv Detail & Related papers (2022-10-20T09:50:05Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.