Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave
Minimax Problem
- URL:
- Date: Sun, 10 Oct 2021 14:54:23 GMT
- Title: Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave
Minimax Problem
- Authors: Luo Luo, Cheng Chen
- Abstract summary: In this paper, we consider non-asymotic behavior of finding second-order stationary point for minimax problem.
For high-dimensional problems, we propose anf to expensive cost form second-order oracle which solves the cubic sub-problem in gradient via descent and Chebyshev expansion.
- Score: 16.689304539024036
- License:
- Abstract: We study the smooth minimax optimization problem of the form $\min_{\bf
x}\max_{\bf y} f({\bf x},{\bf y})$, where the objective function is
strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. This problem
includes a lot of applications in machine learning such as regularized GAN,
reinforcement learning and adversarial training. Most of existing theory
related to gradient descent accent focus on establishing the convergence result
for achieving the first-order stationary point of $f({\bf x},{\bf y})$ or
primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y})$. In this
paper, we design a new optimization method via cubic Newton iterations, which
could find an ${\mathcal
stationary point of $P({\bf x})$ with ${\mathcal
O}\left(\kappa^{1.5}\sqrt{\rho}\varepsilon^{-1.5}\right)$ second-order oracle
calls and $\tilde{\mathcal
O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right)$ first-order oracle
calls, where $\kappa$ is the condition number and $\rho$ is the Hessian
smoothness coefficient of $f({\bf x},{\bf y})$. For high-dimensional problems,
we propose an variant algorithm to avoid expensive cost form second-order
oracle, which solves the cubic sub-problem inexactly via gradient descent and
matrix Chebyshev expansion. This strategy still obtains desired approximate
second-order stationary point with high probability but only requires
$\tilde{\mathcal O}\left(\kappa^{1.5}\ell\varepsilon^{-2}\right)$
Hessian-vector oracle and $\tilde{\mathcal
O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right)$ first-order oracle
calls. To the best of our knowledge, this is the first work considers
non-asymptotic convergence behavior of finding second-order stationary point
for minimax problem without convex-concave assumption.
Related papers
- Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles [38.45952947660789]
This work investigates the performance limits of projected first-order methods for minimizing functions under the $(alpha,tau,mathcal)$-projected-dominance property.
arXiv Detail & Related papers (2024-08-03T18:34:23Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Optimal and Efficient Algorithms for Decentralized Online Convex Optimization [51.00357162913229]
Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications.
We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$.
Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Faster Stochastic Algorithms for Minimax Optimization under
Polyak--{\L}ojasiewicz Conditions [12.459354707528819]
We propose SPIDER-GDA for solving the finite-sum problem of the form $min_x max_y f(x,y)triqangle frac1n sum_i=1n f_i(x,y)$.
We prove SPIDER-GDA could find an $epsilon$-optimal solution within $mathcal Oleft((n + sqrtn,kappa_xkappa_y2)log (1/epsilon)
arXiv Detail & Related papers (2023-07-29T02:26:31Z) - An Optimal Algorithm for Strongly Convex Min-min Optimization [79.11017157526815]
Existing optimal first-order methods require $mathcalO(sqrtmaxkappa_x,kappa_y log 1/epsilon)$ of computations of both $nabla_x f(x,y)$ and $nabla_y f(x,y)$.
We propose a new algorithm that only requires $mathcalO(sqrtkappa_x log 1/epsilon)$ of computations of $nabla_x f(x,
arXiv Detail & Related papers (2022-12-29T19:26:12Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced
Convex-Concave Minimax Optimization [41.432757205864796]
This paper considers first-order algorithms for convex-con minimax problems of the form $min_bf xmax_yf(bfbf y) simultaneously.
Our methods can be used to solve more general unbalanced minimax problems and are also near optimal.
arXiv Detail & Related papers (2021-06-03T11:30:32Z) - DIPPA: An improved Method for Bilinear Saddle Point Problems [18.65143269806133]
This paper deals with point dependency problems $min_bfx max_bfy g(fracx) + bfxtop bfbftop fracbfA kappa_x kappa_x (kappa_x + kappa_y) kappa_y (kappa_x + kappa_y) kappa_y (kappa_x + kappa_y)
arXiv Detail & Related papers (2021-03-15T10:55:30Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Near-Optimal Algorithms for Minimax Optimization [115.21519161773287]
The paper presents the first with $tildeO(sqrtkappa_mathbf xkappa_mathbf)$, matching the design on logarithmic factors.
The paper also presents algorithms that match or outperform all existing methods in these settings in terms of complexity.
arXiv Detail & Related papers (2020-02-05T16:49:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.