Related papers: Efficient Convex Optimization Requires Superlinear Memory

Efficient Convex Optimization Requires Superlinear Memory

URL: http://arxiv.org/abs/2203.15260v2
Date: Wed, 24 Jul 2024 11:21:47 GMT
Title: Efficient Convex Optimization Requires Superlinear Memory
Authors: Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant,
Abstract summary: We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1T-Lipschitz convex functions over the unit ball to $1/mathrmpoly(d)$ accuracy using at most $d1.25 - delta$ bits of memory must make at least $tildeOmega(d1 + (4/3)delta)$ first-order queries.
Score: 27.11113888243391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1.25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal $\tilde{O}(d)$ query bound for this problem obtained by cutting plane methods that use $\tilde{O}(d^2)$ memory. This resolves a COLT 2019 open problem of Woodworth and Srebro.

Related papers

The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems [0.0]
We show that to solve feasibility problems with accuracy any deterministic algorithm either uses $d1+delta$ bits of memory or must make at least $1/(d0.01delta epsilon2frac1-delta1+1.01 delta-o(1))$ oracle queries. We also show that randomized algorithms either use $d1+delta$ memory or make at least $1/(d2delta epsilon2(1-4delta)-o(1))$ queries for any $deltain
arXiv Detail & Related papers (2024-04-10T04:15:50Z)
A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions [44.655316553524855]
We design algorithms for minimizing $max_iin[n] f_i(x) over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $epsilon-approximate solution.
arXiv Detail & Related papers (2023-11-17T22:07:18Z)
Memory-Query Tradeoffs for Randomized Convex Optimization [16.225462097812766]
We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $Omega(d2-delta)$ bits of memory or make $Omega(d1+delta/6-o(1))$ queries.
arXiv Detail & Related papers (2023-06-21T19:48:58Z)
Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$. In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal [23.94542304111204]
We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization. We prove that to minimize $1$-Lipschitz convex functions over the unit ball to $1/d4$ accuracy, any deterministic first-order algorithms using at most $d2-delta$ bits of memory must make $tildeOmega(d1+delta/3)$ queries.
arXiv Detail & Related papers (2023-02-09T22:37:27Z)
An Optimal Algorithm for Strongly Convex Min-min Optimization [79.11017157526815]
Existing optimal first-order methods require $mathcalO(sqrtmaxkappa_x,kappa_y log 1/epsilon)$ of computations of both $nabla_x f(x,y)$ and $nabla_y f(x,y)$. We propose a new algorithm that only requires $mathcalO(sqrtkappa_x log 1/epsilon)$ of computations of $nabla_x f(x,
arXiv Detail & Related papers (2022-12-29T19:26:12Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.