Stochastic Approximation Beyond Gradient for Signal Processing and
Machine Learning
- URL: http://arxiv.org/abs/2302.11147v2
- Date: Sun, 16 Jul 2023 04:31:02 GMT
- Title: Stochastic Approximation Beyond Gradient for Signal Processing and
Machine Learning
- Authors: Aymeric Dieuleveut, Gersende Fort, Eric Moulines, Hoi-To Wai
- Abstract summary: Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing.
This article introduces the non-stochastic-gradient perspectives of SA to the signal processing and machine learning audiences.
We build our analysis framework based on classes of Lyapunov functions that satisfy a variety of mild conditions.
- Score: 40.636276521022474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Approximation (SA) is a classical algorithm that has had since the
early days a huge impact on signal processing, and nowadays on machine
learning, due to the necessity to deal with a large amount of data observed
with uncertainties. An exemplar special case of SA pertains to the popular
stochastic (sub)gradient algorithm which is the working horse behind many
important applications. A lesser-known fact is that the SA scheme also extends
to non-stochastic-gradient algorithms such as compressed stochastic gradient,
stochastic expectation-maximization, and a number of reinforcement learning
algorithms. The aim of this article is to overview and introduce the
non-stochastic-gradient perspectives of SA to the signal processing and machine
learning audiences through presenting a design guideline of SA algorithms
backed by theories. Our central theme is to propose a general framework that
unifies existing theories of SA, including its non-asymptotic and asymptotic
convergence results, and demonstrate their applications on popular
non-stochastic-gradient algorithms. We build our analysis framework based on
classes of Lyapunov functions that satisfy a variety of mild conditions. We
draw connections between non-stochastic-gradient algorithms and scenarios when
the Lyapunov function is smooth, convex, or strongly convex. Using the said
framework, we illustrate the convergence properties of the
non-stochastic-gradient algorithms using concrete examples. Extensions to the
emerging variance reduction techniques for improved sample complexity will also
be discussed.
Related papers
- A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms [4.862625283098196]
We develop a single unified analysis that applies to all three algorithms: SAGA, IAG.<n>We provide the first high- and non-probability bounds for each of these algorithms.<n>We obtain the best known rates for the IAG byproduct.
arXiv Detail & Related papers (2026-02-05T04:57:20Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z) - Non-asymptotic estimates for TUSLA algorithm for non-convex learning
with applications to neural networks with ReLU activation function [3.5044892799305956]
We provide a non-asymptotic analysis for the tamed un-adjusted Langevin algorithm (TUSLA) introduced in Lovas et al.
In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wassersteinstein-1-2.
We show that the TUSLA algorithm converges rapidly to the optimal solution.
arXiv Detail & Related papers (2021-07-19T07:13:02Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Practical Precoding via Asynchronous Stochastic Successive Convex
Approximation [8.808993671472349]
We consider optimization of a smooth non-studied loss function with a convex non-smooth regularizer.
In this work, we take a closer look at the SCA algorithm and develop its asynchronous variant for resource allocation in wireless networks.
arXiv Detail & Related papers (2020-10-03T13:53:30Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification.
A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule.
EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.