Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis
- URL: http://arxiv.org/abs/2209.10825v4
- Date: Wed, 15 Jan 2025 20:43:18 GMT
- Title: Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis
- Authors: Jiajin Li, Linglingzhi Zhu, Anthony Man-Cho So,
- Abstract summary: We introduce a novel analysis for PLDA, the key are our newly developed nonsmooth and dual error iterations.
When $thetain [0,12]$, PLDA achieves the optimal $mathcalO()$ under mild assumptions.
- Score: 23.80683445944524
- License:
- Abstract: Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.
Related papers
- Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits.
Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator.
We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - Enhancing Convergence of Decentralized Gradient Tracking under the KL Property [10.925931212031692]
We establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA.
This matches the convergence behavior of centralized-gradient algorithms except when.
thetain (1/2,1)$, sub rate is certified; and.
textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified; and.
textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified.
arXiv Detail & Related papers (2024-12-12T18:44:36Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions [19.008521454738425]
We study the problem of differentially private convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $ktextth$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound.
We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $Gcdot frac 1 sqrt n + G_k cdot (fracsqrt dnepsilon)1
arXiv Detail & Related papers (2024-06-04T21:26:29Z) - Decentralized Weakly Convex Optimization Over the Stiefel Manifold [28.427697270742947]
We focus on the Stiefel manifold in the decentralized setting, where a connected network of agents in $nMn log-1)$ are tested.
We propose an method called the decentralized subgradient method (DRSM)$ for forcing a natural station below $nMn log-1)$.
arXiv Detail & Related papers (2023-03-31T02:56:23Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Oracle Complexity in Nonsmooth Nonconvex Optimization [49.088972349825085]
It is well-known that given a smooth, bounded-from-below $$stationary points, Oracle-based methods can find smooth approximation of smoothness.
In this paper, we prove an inherent trade-off between optimization and smoothing dimension.
arXiv Detail & Related papers (2021-04-14T10:42:45Z) - The Complexity of Nonconvex-Strongly-Concave Minimax Optimization [43.07732143522183]
This paper establishes the complexity for finding approximate stationary points of non-strongly-concave (NC-SC) smooth minimax problems.
We deploy a proposed sequence of $Omega-strong$lyconcave sub-2 problems in both general complexity and averaged complexity.
In our proposed finite-sum setting, our proposed algorithm provides a nearly-tight dependence on the condition number.
arXiv Detail & Related papers (2021-03-29T18:53:57Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.