A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and
Minimizing the Maximum of Smooth Functions
- URL: http://arxiv.org/abs/2311.10886v1
- Date: Fri, 17 Nov 2023 22:07:18 GMT
- Title: A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and
Minimizing the Maximum of Smooth Functions
- Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford
- Abstract summary: We design algorithms for minimizing $max_iin[n] f_i(x) over a $d$-dimensional Euclidean or simplex domain.
When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $epsilon-approximate solution.
- Score: 44.655316553524855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a
$d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz
and $1$-smooth, our method computes an $\epsilon$-approximate solution using
$\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ gradient and function
evaluations, and $\widetilde{O}(n \epsilon^{-4/3})$ additional runtime. For
large $n$, our evaluation complexity is optimal up to polylogarithmic factors.
In the special case where each $f_i$ is linear -- which corresponds to finding
a near-optimal primal strategy in a matrix game -- our method finds an
$\epsilon$-approximate solution in runtime $\widetilde{O}(n (d/\epsilon)^{2/3}
+ nd + d\epsilon^{-2})$. For $n>d$ and $\epsilon=1/\sqrt{n}$ this improves over
all existing first-order methods. When additionally $d = \omega(n^{8/11})$ our
runtime also improves over all known interior point methods.
Our algorithm combines three novel primitives: (1) A dynamic data structure
which enables efficient stochastic gradient estimation in small $\ell_2$ or
$\ell_1$ balls. (2) A mirror descent algorithm tailored to our data structure
implementing an oracle which minimizes the objective over these balls. (3) A
simple ball oracle acceleration framework suitable for non-Euclidean geometry.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - An Optimal Algorithm for Strongly Convex Min-min Optimization [79.11017157526815]
Existing optimal first-order methods require $mathcalO(sqrtmaxkappa_x,kappa_y log 1/epsilon)$ of computations of both $nabla_x f(x,y)$ and $nabla_y f(x,y)$.
We propose a new algorithm that only requires $mathcalO(sqrtkappa_x log 1/epsilon)$ of computations of $nabla_x f(x,
arXiv Detail & Related papers (2022-12-29T19:26:12Z) - Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple,
and Practical [1.2183405753834562]
Given a finite set system $(X,mathcal S)$, the emphdiscrepancy of a two-coloring $chi:Xto-1,1$ is defined as $max_S in mathcal S|chi(S)|$.
We propose a randomized algorithm which, for any $d>0$ and $(X,mathcal S)$ with dual shatter function $pi*(k)=O(kd)$, returns a coloring with expected discrepancy
arXiv Detail & Related papers (2022-09-02T15:59:09Z) - Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z) - Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss [41.17536985461902]
We prove an oracle complexity lower bound scaling as $Omega(Nepsilon-2/3)$, showing that our dependence on $N$ is optimal up to polylogarithmic factors.
We develop methods with improved complexity bounds of $tildeO(Nepsilon-2/3 + sqrtNepsilon-8/3)$ in the non-smooth case and $tildeO(Nepsilon-2/3 + sqrtNepsilon-1)$ in
arXiv Detail & Related papers (2021-05-04T21:49:15Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.