Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data
- URL: http://arxiv.org/abs/2203.15797v2
- Date: Fri, 23 Jun 2023 04:11:18 GMT
- Title: Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data
- Authors: Ahmet Alacaoglu, Hanbaek Lyu
- Abstract summary: We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
- Score: 7.513100214864646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on analyzing the classical stochastic projected gradient methods
under a general dependent data sampling scheme for constrained smooth nonconvex
optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$
and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an
$\varepsilon$-near stationary point in terms of the norm of the gradient of
Moreau envelope and gradient mapping. While classical convergence guarantee
requires i.i.d. data sampling from the target distribution, we only require a
mild mixing condition of the conditional distribution, which holds for a wide
class of Markov chain sampling algorithms. This improves the existing
complexity for the constrained smooth nonconvex optimization with dependent
data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a
significantly simpler analysis. We illustrate the generality of our approach by
deriving convergence results with dependent data for stochastic proximal
gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic
gradient algorithm with heavy ball momentum. As an application, we obtain first
online nonnegative matrix factorization algorithms for dependent data based on
stochastic projected gradient methods with adaptive step sizes and optimal rate
of convergence.
Related papers
- Gradient Methods with Online Scaling [19.218484733179356]
We introduce a framework to accelerate the convergence of gradient-based methods with online learning.
We show for the first time that the widely-used hypergradient descent improves on the convergence of gradient descent.
arXiv Detail & Related papers (2024-11-04T05:04:18Z) - An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization [10.36447258513813]
We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL)
In particular, the method has shown to admit an $O(epsilon-4)$ sample to an $epsilon$-stationary point, under standard conditions.
Our analysis shows that the sample complexity can be improved from $O(epsilon-4)$ to $O(epsilon-3)$ under additional conditions.
arXiv Detail & Related papers (2024-01-23T06:01:29Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions [12.096252285460814]
We show that the conditional gradient method requires $mathcalO(epsilon-1.5)$ calls to the gradient oracle to find an $epsilon$-optimal solution.
By including a gradient sliding step, we show that the number of calls reduces to $mathcalO(epsilon-1.5)$.
arXiv Detail & Related papers (2020-06-15T06:51:39Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.