An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite
Nonconvex Optimization
- URL: http://arxiv.org/abs/2008.09055v1
- Date: Thu, 20 Aug 2020 16:15:12 GMT
- Title: An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite
Nonconvex Optimization
- Authors: Deyi Liu, Lam M. Nguyen, and Quoc Tran-Dinh
- Abstract summary: We propose a new variant of the hybrid variance method in [7] to solve a common composite non-reduced problem under standard assumptions.
We simply replace the independent unbiased estimator in our estimator introduced in [7] by the gradient evaluated the same sample.
Our analysis is essentially inspired by [7], but we do not use two different step-sizes.
- Score: 23.355249183979907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this note we propose a new variant of the hybrid variance-reduced proximal
gradient method in [7] to solve a common stochastic composite nonconvex
optimization problem under standard assumptions. We simply replace the
independent unbiased estimator in our hybrid- SARAH estimator introduced in [7]
by the stochastic gradient evaluated at the same sample, leading to the
identical momentum-SARAH estimator introduced in [2]. This allows us to save
one stochastic gradient per iteration compared to [7], and only requires two
samples per iteration. Our algorithm is very simple and achieves optimal
stochastic oracle complexity bound in terms of stochastic gradient evaluations
(up to a constant factor). Our analysis is essentially inspired by [7], but we
do not use two different step-sizes.
Related papers
- On the Stochastic (Variance-Reduced) Proximal Gradient Method for
Regularized Expected Reward Optimization [12.244251361123396]
We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL)
In particular, the method has shown to admit an $O(epsilon-4)$ sample to an $epsilon-stationary point, under standard conditions.
We show that the sample complexity can be improved from $epsilon-4)$ to $O(epsilon-3)$ under additional conditions.
arXiv Detail & Related papers (2024-01-23T06:01:29Z) - A Homogenization Approach for Gradient-Dominated Stochastic Optimization [6.1144486886258065]
We propose a homogeneous second-order descent method (SHSOD) for functions enjoying gradient dominance.
Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated optimization.
arXiv Detail & Related papers (2023-08-21T11:03:04Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Convergence Analysis for Shuffling-Type Gradient Methods [32.8097849940763]
We propose a unified convergence analysis for a generic gradient shufflingtype methods for solving finitesum problems.
Our results suggest some choices appropriate for training rates in certain neural shuffling variants.
arXiv Detail & Related papers (2020-02-19T15:45:41Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.