Related papers: Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

URL: http://arxiv.org/abs/2305.12056v2
Date: Sat, 28 Oct 2023 16:34:39 GMT
Title: Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent
Authors: Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli
Abstract summary: The last decade has witnessed an increasing number of stability for different algorithms applied on different loss functions. We make a novel connection between theory applied and introduce a unified guideline for proving stability for optimization algorithms. Our approach is flexible and can be readilyizable to other popular learning classes.
Score: 30.84181129503133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds -- which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.

Related papers

Towards Weaker Variance Assumptions for Stochastic Optimization [19.339358874690568]
We revisit a classical assumption for analyzing gradient algorithms where the squared norm of the subgradient is allowed to grow as fast as the squared norm of the optimization variable. We analyze convex problems with functional constraints or regularized convex-concave min-max problems. We obtain rates for optimality measures that do not require boundedness of the feasible set.
arXiv Detail & Related papers (2025-04-14T07:26:34Z)
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees. Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance [17.199063087458907]
We quantify the convergence rate of the Mirror Descent algorithm with a class of uniformly convex mirror maps. This algorithm does not require any explicit gradient clipping or normalization. We complement our results with information-theoretic lower bounds showing that no other algorithm using only first-order oracles can achieve improved rates.
arXiv Detail & Related papers (2022-02-23T17:08:40Z)
On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade. Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad) We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses. Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
Convergence rates and approximation results for SGD and its continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes. First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling. Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z)
Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise. We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales. We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.