Variance reduction for Riemannian non-convex optimization with batch
size adaptation
- URL: http://arxiv.org/abs/2007.01494v1
- Date: Fri, 3 Jul 2020 04:34:39 GMT
- Title: Variance reduction for Riemannian non-convex optimization with batch
size adaptation
- Authors: Andi Han, Junbin Gao
- Abstract summary: Variance experiments are popular accelerating batch descent techniques.
We show that this strategy can achieve both lower complexities for total convergence functions under both finite-sum and online settings.
Specifically, we prove that R-SRG bounds the same near as R-IDER without requiring a small step size.
- Score: 36.79189106909088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variance reduction techniques are popular in accelerating gradient descent
and stochastic gradient descent for optimization problems defined on both
Euclidean space and Riemannian manifold. In this paper, we further improve on
existing variance reduction methods for non-convex Riemannian optimization,
including R-SVRG and R-SRG/R-SPIDER with batch size adaptation. We show that
this strategy can achieve lower total complexities for optimizing both general
non-convex and gradient dominated functions under both finite-sum and online
settings. As a result, we also provide simpler convergence analysis for R-SVRG
and improve complexity bounds for R-SRG under finite-sum setting. Specifically,
we prove that R-SRG achieves the same near-optimal complexity as R-SPIDER
without requiring a small step size. Empirical experiments on a variety of
tasks demonstrate effectiveness of proposed adaptive batch size scheme.
Related papers
- A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm called ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
Experimental results on GDRO and partial Area Under the ROC Curve for cFCCO demonstrate the promising performance of our algorithm.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Optimal Guarantees for Algorithmic Reproducibility and Gradient
Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better.
We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv Detail & Related papers (2023-10-26T19:56:52Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Iterative Reweighted Least Squares Networks With Convergence Guarantees
for Solving Inverse Imaging Problems [12.487990897680422]
We present a novel optimization strategy for image reconstruction tasks under analysis-based image regularization.
We parameterize such regularizers using potential functions that correspond to weighted extensions of the $ell_pp$-vector and $mathcalS_pp$ Schatten-matrix quasi-norms.
We show that thanks to the convergence guarantees of our proposed minimization strategy, such optimization can be successfully performed with a memory-efficient implicit back-propagation scheme.
arXiv Detail & Related papers (2023-08-10T17:59:46Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - On Stochastic Variance Reduced Gradient Method for Semidefinite
Optimization [14.519696724619074]
The SVRG method has been regarded as one of the most effective methods.
There is a significant gap between the theory and practice when adapted to the semidefinite programming (SDP)
In this paper, we fill this gap via exploiting a new variant of the original SVRG with Option I adapted to the semidefinite optimization.
arXiv Detail & Related papers (2021-01-01T13:55:32Z) - Variance Reduction on Adaptive Stochastic Mirror Descent [23.451652399157002]
We prove that variance reduction reduces SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence.
We check the validity of our claims using experiments in deep learning.
arXiv Detail & Related papers (2020-12-26T15:15:51Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.