Escaping Saddle Points in Nonconvex Minimax Optimization via
Cubic-Regularized Gradient Descent-Ascent
- URL: http://arxiv.org/abs/2110.07098v2
- Date: Fri, 15 Oct 2021 00:55:49 GMT
- Title: Escaping Saddle Points in Nonconvex Minimax Optimization via
Cubic-Regularized Gradient Descent-Ascent
- Authors: Ziyi Chen, Qunwei Li, Yi Zhou
- Abstract summary: The gradient descent-ascent (GDA) algorithm has been widely applied to solve non minimax optimization problems.
We develop the first-type GDA algorithm for escaping points in non-strongly-concave minimax optimization.
- Score: 13.565010494569243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gradient descent-ascent (GDA) algorithm has been widely applied to solve
nonconvex minimax optimization problems. However, the existing GDA-type
algorithms can only find first-order stationary points of the envelope function
of nonconvex minimax optimization problems, which does not rule out the
possibility to get stuck at suboptimal saddle points. In this paper, we develop
Cubic-GDA -- the first GDA-type algorithm for escaping strict saddle points in
nonconvex-strongly-concave minimax optimization. Specifically, the algorithm
uses gradient ascent to estimate the second-order information of the minimax
objective function, and it leverages the cubic regularization technique to
efficiently escape the strict saddle points. Under standard smoothness
assumptions on the objective function, we show that Cubic-GDA admits an
intrinsic potential function whose value monotonically decreases in the minimax
optimization process. Such a property leads to a desired global convergence of
Cubic-GDA to a second-order stationary point at a sublinear rate. Moreover, we
analyze the convergence rate of Cubic-GDA in the full spectrum of a gradient
dominant-type nonconvex geometry. Our result shows that Cubic-GDA achieves an
orderwise faster convergence rate than the standard GDA for a wide spectrum of
gradient dominant geometry. Our study bridges minimax optimization with
second-order optimization and may inspire new developments along this
direction.
Related papers
- Near-Optimal Algorithms for Making the Gradient Small in Stochastic
Minimax Optimization [14.719077076351377]
We study the problem of finding a near-stationary point for smooth minimax optimization.
We show that a novel algorithm called Recursive IteratioNRAIN achieves both convex-concave and strongly-concave cases.
arXiv Detail & Related papers (2022-08-11T16:55:26Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling [29.600975900977343]
We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
arXiv Detail & Related papers (2020-03-29T11:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.