Non asymptotic analysis of Adaptive stochastic gradient algorithms and
applications
- URL: http://arxiv.org/abs/2303.01370v1
- Date: Wed, 1 Mar 2023 07:36:03 GMT
- Title: Non asymptotic analysis of Adaptive stochastic gradient algorithms and
applications
- Authors: Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Pierre Tarrago (LPSM
(UMR\_8001))
- Abstract summary: This paper is devoted to the non analyis of these adaptive gradient algorithms for strongly convex objective.
All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad algorithm and Newton algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In stochastic optimization, a common tool to deal sequentially with large
sample is to consider the well-known stochastic gradient algorithm.
Nevertheless, since the stepsequence is the same for each direction, this can
lead to bad results in practice in case of ill-conditionned problem. To
overcome this, adaptive gradient algorithms such that Adagrad or Stochastic
Newton algorithms should be prefered. This paper is devoted to the non
asymptotic analyis of these adaptive gradient algorithms for strongly convex
objective. All the theoretical results will be adapted to linear regression and
regularized generalized linear model for both Adagrad and Stochastic Newton
algorithms.
Related papers
- Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent [1.534667887016089]
We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow.
We study the stability of the modified equation in the case of coordinate descent.
arXiv Detail & Related papers (2023-09-05T09:39:56Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Reparametrizing gradient descent [0.0]
We propose an optimization algorithm which we call norm-adapted gradient descent.
Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
arXiv Detail & Related papers (2020-10-09T20:22:29Z) - Stochastic Approximate Gradient Descent via the Langevin Algorithm [11.36635610546803]
We introduce the approximate gradient descent (SAGD) as an alternative to the gradient descent for cases where unbiased gradients cannot be trivially obtained.
We show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
arXiv Detail & Related papers (2020-02-13T14:29:21Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.