Gradient Norm Regularization Second-Order Algorithms for Solving Nonconvex-Strongly Concave Minimax Problems
- URL: http://arxiv.org/abs/2411.15769v1
- Date: Sun, 24 Nov 2024 09:46:36 GMT
- Title: Gradient Norm Regularization Second-Order Algorithms for Solving Nonconvex-Strongly Concave Minimax Problems
- Authors: Jun-Lin Wang, Zi Xu,
- Abstract summary: We propose an algorithm for solving non-strongly concave minimax problems.
We show that the proposed algorithm achieves an $mathcal.
stilon.
$second-order norm is proved to be upper by.
$tk.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k.
$k
- Score: 2.3721580767877257
- License:
- Abstract: In this paper, we study second-order algorithms for solving nonconvex-strongly concave minimax problems, which have attracted much attention in recent years in many fields, especially in machine learning. We propose a gradient norm regularized trust region (GRTR) algorithm to solve nonconvex-strongly concave minimax problems, where the objective function of the trust region subproblem in each iteration uses a regularized version of the Hessian matrix, and the regularization coefficient and the radius of the ball constraint are proportional to the square root of the gradient norm. The iteration complexity of the proposed GRTR algorithm to obtain an $\mathcal{O}(\epsilon,\sqrt{\epsilon})$-second-order stationary point is proved to be upper bounded by $\tilde{\mathcal{O}}(\rho^{0.5}\kappa^{1.5}\epsilon^{-3/2})$, where $\rho$ and $\kappa$ are the Lipschitz constant of the Jacobian matrix and the condition number of the objective function respectively, which matches the best known iteration complexity of second-order methods for solving nonconvex-strongly concave minimax problems. We further propose a Levenberg-Marquardt algorithm with a gradient norm regularization coefficient and use the negative curvature direction to correct the iteration direction (LMNegCur), which does not need to solve the trust region subproblem at each iteration. We also prove that the LMNegCur algorithm achieves an $\mathcal{O}(\epsilon,\sqrt{\epsilon})$-second-order stationary point within $\tilde{\mathcal{O}}(\rho^{0.5}\kappa^{1.5}\epsilon^{-3/2})$ number of iterations. Numerical results show the efficiency of both proposed algorithms.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - A Fully Parameter-Free Second-Order Algorithm for Convex-Concave Minimax Problems with Optimal Iteration Complexity [2.815239177328595]
We propose a Lipschitz-free cubic regularization (LF-CR) algorithm for solving the convex-concave minimax optimization problem.
We also propose a fully parameter-free cubic regularization (FF-CR) algorithm that does not require any parameters of the problem.
To the best of our knowledge, the proposed FF-CR algorithm is the first completely parameter-free second-order algorithm for solving convex-concave minimax optimization problems.
arXiv Detail & Related papers (2024-07-04T01:46:07Z) - Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for
Nonconvex Minimax Problems with Coupled linear Constraints [3.898056433020865]
We propose two zero-order regular complexity algorithms for non minimax problems with linear constraints.
To the best of our knowledge, they are first two zero-order algorithms with best for noncal complexity.
arXiv Detail & Related papers (2024-01-26T11:22:13Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - Zeroth-Order Alternating Gradient Descent Ascent Algorithms for a Class
of Nonconvex-Nonconcave Minimax Problems [9.792800635198935]
We propose a zeroth-order alternating gradient descent ascent (ZO-AGDA) algorithm for solving the NC-PL minimax problem.
To the best of our knowledge, they are the first two zeroth-order algorithms with the complexity gurantee for solving NC-PL minimax problems.
arXiv Detail & Related papers (2022-11-24T15:34:33Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Derivative-free Alternating Projection Algorithms for General
Nonconvex-Concave Minimax Problems [9.173866646584031]
In this paper, we propose an algorithm for nonsmooth zeroth-order minimax problems.
We show that it can be used to attack nonconcave minimax problems.
arXiv Detail & Related papers (2021-08-01T15:23:49Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.
The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.
It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.