Quadratic Memory is Necessary for Optimal Query Complexity in Convex
Optimization: Center-of-Mass is Pareto-Optimal
- URL: http://arxiv.org/abs/2302.04963v2
- Date: Fri, 19 May 2023 01:24:24 GMT
- Title: Quadratic Memory is Necessary for Optimal Query Complexity in Convex
Optimization: Center-of-Mass is Pareto-Optimal
- Authors: Mo\"ise Blanchard, Junhui Zhang and Patrick Jaillet
- Abstract summary: We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization.
We prove that to minimize $1$-Lipschitz convex functions over the unit ball to $1/d4$ accuracy, any deterministic first-order algorithms using at most $d2-delta$ bits of memory must make $tildeOmega(d1+delta/3)$ queries.
- Score: 23.94542304111204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give query complexity lower bounds for convex optimization and the related
feasibility problem. We show that quadratic memory is necessary to achieve the
optimal oracle complexity for first-order convex optimization. In particular,
this shows that center-of-mass cutting-planes algorithms in dimension $d$ which
use $\tilde O(d^2)$ memory and $\tilde O(d)$ queries are Pareto-optimal for
both convex optimization and the feasibility problem, up to logarithmic
factors. Precisely, we prove that to minimize $1$-Lipschitz convex functions
over the unit ball to $1/d^4$ accuracy, any deterministic first-order
algorithms using at most $d^{2-\delta}$ bits of memory must make
$\tilde\Omega(d^{1+\delta/3})$ queries, for any $\delta\in[0,1]$. For the
feasibility problem, in which an algorithm only has access to a separation
oracle, we show a stronger trade-off: for at most $d^{2-\delta}$ memory, the
number of queries required is $\tilde\Omega(d^{1+\delta})$. This resolves a
COLT 2019 open problem of Woodworth and Srebro.
Related papers
- Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems [0.0]
We show that to solve feasibility problems with accuracy any deterministic algorithm either uses $d1+delta$ bits of memory or must make at least $1/(d0.01delta epsilon2frac1-delta1+1.01 delta-o(1))$ oracle queries.
We also show that randomized algorithms either use $d1+delta$ memory or make at least $1/(d2delta epsilon2(1-4delta)-o(1))$ queries for any $deltain
arXiv Detail & Related papers (2024-04-10T04:15:50Z) - The Computational Complexity of Finding Stationary Points in Non-Convex Optimization [53.86485757442486]
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-order but smooth objective functions is a computational problem.
We show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.
arXiv Detail & Related papers (2023-10-13T14:52:46Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Memory-Query Tradeoffs for Randomized Convex Optimization [16.225462097812766]
We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $Omega(d2-delta)$ bits of memory or make $Omega(d1+delta/6-o(1))$ queries.
arXiv Detail & Related papers (2023-06-21T19:48:58Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - The First Optimal Acceleration of High-Order Methods in Smooth Convex
Optimization [88.91190483500932]
We study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems.
The reason for this is that these algorithms require performing a complex binary procedure, which makes them neither optimal nor practical.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft(epsilon-2/(p+1)right) $pth order oracle complexity.
arXiv Detail & Related papers (2022-05-19T16:04:40Z) - Efficient Convex Optimization Requires Superlinear Memory [27.11113888243391]
We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1T-Lipschitz convex functions over the unit ball to $1/mathrmpoly(d)$ accuracy using at most $d1.25 - delta$ bits of memory must make at least $tildeOmega(d1 + (4/3)delta)$ first-order queries.
arXiv Detail & Related papers (2022-03-29T06:15:54Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.