Non-Asymptotic Analysis of Stochastic Approximation Algorithms for
Streaming Data
- URL: http://arxiv.org/abs/2109.07117v7
- Date: Mon, 24 Apr 2023 07:16:38 GMT
- Title: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for
Streaming Data
- Authors: Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Nicklas Werge (LPSM
(UMR\_8001)), Olivier Wintenberger (LPSM (UMR\_8001))
- Abstract summary: Streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially.
We provide non-asymptotic convergence rates of various gradient-based algorithms.
We show how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a streaming framework for analyzing stochastic
approximation/optimization problems. This streaming framework is analogous to
solving optimization problems using time-varying mini-batches that arrive
sequentially. We provide non-asymptotic convergence rates of various
gradient-based algorithms; this includes the famous Stochastic Gradient (SG)
descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varying
mini-batch SG algorithms, as well as their iterated averages (a.k.a.
Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosing
the learning rate according to the time-varying mini-batches, ii) that
Polyak-Ruppert averaging achieves optimal convergence in terms of attaining the
Cramer-Rao lower bound, and iii) how time-varying mini-batches together with
Polyak-Ruppert averaging can provide variance reduction and accelerate
convergence simultaneously, which is advantageous for many learning problems,
such as online, sequential, and large-scale learning. We further demonstrate
these favorable effects for various time-varying mini-batches.
Related papers
- Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - Learning from time-dependent streaming data with online stochastic
algorithms [7.283533791778357]
This paper addresses optimization in a streaming setting with time-dependent and biased estimates.
We analyze several first-order methods, including Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages.
arXiv Detail & Related papers (2022-05-25T07:53:51Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.