Related papers: A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

URL: http://arxiv.org/abs/2602.05304v1
Date: Thu, 05 Feb 2026 04:57:20 GMT
Title: A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms
Authors: Feng Zhu, Robert W. Heath, Aritra Mitra,
Abstract summary: We develop a single unified analysis that applies to all three algorithms: SAGA, IAG.<n>We provide the first high- and non-probability bounds for each of these algorithms.<n>We obtain the best known rates for the IAG byproduct.
Score: 4.862625283098196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.

Related papers

Stochastic Average Gradient : A Simple Empirical Investigation [0.0]
Average gradient (SAG) is a method for optimizing the sum of a finite number of smooth functions. SAG converges faster than other iterations on simple toy problems and performs better than many other iterations on simple machine learning problems. We also propose a combination of SAG with the momentum algorithm and Adam.
arXiv Detail & Related papers (2023-07-27T17:34:26Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning [40.636276521022474]
Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing. This article introduces the non-stochastic-gradient perspectives of SA to the signal processing and machine learning audiences. We build our analysis framework based on classes of Lyapunov functions that satisfy a variety of mild conditions.
arXiv Detail & Related papers (2023-02-22T05:00:51Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively. We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z)
Formal guarantees for heuristic optimization algorithms used in machine learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems. We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z)
A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems. "vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z)
Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient) We pay particular attention to the time and memory overhead of our proposed method. We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise [23.62008807533706]
This paper develops the first finite-sample analysis for the Greedy-GQ algorithm. Our paper extends the finite-sample analyses of two timescale reinforcement learning learning algorithms.
arXiv Detail & Related papers (2020-05-20T16:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.