AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods
- URL: http://arxiv.org/abs/2102.09700v1
- Date: Fri, 19 Feb 2021 01:17:15 GMT
- Title: AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods
- Authors: Zheng Shi and Nicolas Loizou and Peter Richt\'arik and Martin
Tak\'a\v{c}
- Abstract summary: We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
- Score: 7.486132958737807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an adaptive stochastic variance reduced method with an implicit
approach for adaptivity. As a variant of SARAH, our method employs the
stochastic recursive gradient yet adjusts step-size based on local geometry. We
provide convergence guarantees for finite-sum minimization problems and show a
faster convergence than SARAH can be achieved if local geometry permits.
Furthermore, we propose a practical, fully adaptive variant, which does not
require any knowledge of local geometry and any effort of tuning the
hyper-parameters. This algorithm implicitly computes step-size and efficiently
estimates local Lipschitz smoothness of stochastic functions. The numerical
experiments demonstrate the algorithm's strong performance compared to its
classical counterparts and other state-of-the-art first-order methods.
Related papers
- Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive
Step Size [29.15132344744801]
We establish local convergence for gradient descent with adaptive step size for problems such as matrix inversion.
We show that these first order optimization methods can achieve sub-linear or linear convergence.
arXiv Detail & Related papers (2021-12-30T00:50:30Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Asymptotic study of stochastic adaptive algorithm in non-convex
landscape [2.1320960069210484]
This paper studies some assumption properties of adaptive algorithms widely used in optimization and machine learning.
Among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms.
arXiv Detail & Related papers (2020-12-10T12:54:45Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.