Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization
- URL: http://arxiv.org/abs/2201.07427v1
- Date: Wed, 19 Jan 2022 05:56:19 GMT
- Title: Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization
- Authors: Kiran Koshy Thekumparampil, Niao He, Sewoong Oh
- Abstract summary: We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
- Score: 47.27237492375659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) +
y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions
and admit first-order gradient oracles. Surprisingly, no known first-order
algorithms have hitherto achieved the lower complexity bound of
$\Omega((\sqrt{\frac{L_x}{\mu_x}} + \frac{\|A\|}{\sqrt{\mu_x \mu_y}} +
\sqrt{\frac{L_y}{\mu_y}}) \log(\frac1{\varepsilon}))$ for solving this problem
up to an $\varepsilon$ primal-dual gap in the general parameter regime, where
$L_x, L_y,\mu_x,\mu_y$ are the corresponding smoothness and strongly convexity
constants.
We close this gap by devising the first optimal algorithm, the Lifted
Primal-Dual (LPD) method. Our method lifts the objective into an extended form
that allows both the smooth terms and the bilinear term to be handled optimally
and seamlessly with the same primal-dual framework. Besides optimality, our
method yields a desirably simple single-loop algorithm that uses only one
gradient oracle call per iteration. Moreover, when $f$ is just convex, the same
algorithm applied to a smoothed objective achieves the nearly optimal iteration
complexity. We also provide a direct single-loop algorithm, using the LPD
method, that achieves the iteration complexity of
$O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} +
\sqrt{\frac{L_y}{\varepsilon}})$. Numerical experiments on quadratic minimax
problems and policy evaluation problems further demonstrate the fast
convergence of our algorithm in practice.
Related papers
- A simple and improved algorithm for noisy, convex, zeroth-order optimisation [59.51990161522328]
We construct an algorithm that returns a point $hat xin barmathcal X$ such that $f(hat x)$ is as small as possible.
We prove that this method is such that the $f(hat x) - min_xin barmathcal X f(x)$ is of smaller order than $d2/sqrtn$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2024-06-26T18:19:10Z) - A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and
Minimizing the Maximum of Smooth Functions [44.655316553524855]
We design algorithms for minimizing $max_iin[n] f_i(x) over a $d$-dimensional Euclidean or simplex domain.
When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $epsilon-approximate solution.
arXiv Detail & Related papers (2023-11-17T22:07:18Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient
Oracle Complexity [15.18055488087588]
We give an algorithm that minimizes the above convex formulation to $epsilon$-accuracy in $widetildeO(sum_i=1n d_i log (1 /epsilon))$ gradient computations.
Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
arXiv Detail & Related papers (2022-08-07T20:53:42Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Sharper Rates for Separable Minimax and Finite Sum Optimization via
Primal-Dual Extragradient Methods [39.87865197769047]
We study separable minimax optimization problems $min_x max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(Lx, mux)$, $(Ly, muy)$, and $h$ is convex-concave with a $(Lambdaxx, Lambdaxy, Lambdayymuyright)$.
arXiv Detail & Related papers (2022-02-09T18:57:47Z) - Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced
Convex-Concave Minimax Optimization [41.432757205864796]
This paper considers first-order algorithms for convex-con minimax problems of the form $min_bf xmax_yf(bfbf y) simultaneously.
Our methods can be used to solve more general unbalanced minimax problems and are also near optimal.
arXiv Detail & Related papers (2021-06-03T11:30:32Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.