Related papers: Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

URL: http://arxiv.org/abs/2404.06720v1
Date: Wed, 10 Apr 2024 04:15:50 GMT
Title: Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems
Authors: Moise Blanchard,
Abstract summary: We show that to solve feasibility problems with accuracy any deterministic algorithm either uses $d1+delta$ bits of memory or must make at least $1/(d0.01delta epsilon2frac1-delta1+1.01 delta-o(1))$ oracle queries. We also show that randomized algorithms either use $d1+delta$ memory or make at least $1/(d2delta epsilon2(1-4delta)-o(1))$ queries for any $deltain
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $\epsilon>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $\epsilon \geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+\delta}$ bits of memory or must make at least $1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)})$ oracle queries, for any $\delta\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+\delta}$ memory or make at least $1/(d^{2\delta} \epsilon^{2(1-4\delta)-o(1)})$ queries for any $\delta\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/\epsilon)$ but makes $\Omega(1/\epsilon^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/\epsilon$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/\epsilon)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/\epsilon)$ queries.

Related papers

The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
The Computational Complexity of Finding Stationary Points in Non-Convex Optimization [53.86485757442486]
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-order but smooth objective functions is a computational problem. We show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.
arXiv Detail & Related papers (2023-10-13T14:52:46Z)
Memory-Query Tradeoffs for Randomized Convex Optimization [16.225462097812766]
We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $Omega(d2-delta)$ bits of memory or make $Omega(d1+delta/6-o(1))$ queries.
arXiv Detail & Related papers (2023-06-21T19:48:58Z)
Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$. In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal [23.94542304111204]
We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization. We prove that to minimize $1$-Lipschitz convex functions over the unit ball to $1/d4$ accuracy, any deterministic first-order algorithms using at most $d2-delta$ bits of memory must make $tildeOmega(d1+delta/3)$ queries.
arXiv Detail & Related papers (2023-02-09T22:37:27Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Efficient Convex Optimization Requires Superlinear Memory [27.11113888243391]
We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1T-Lipschitz convex functions over the unit ball to $1/mathrmpoly(d)$ accuracy using at most $d1.25 - delta$ bits of memory must make at least $tildeOmega(d1 + (4/3)delta)$ first-order queries.
arXiv Detail & Related papers (2022-03-29T06:15:54Z)
Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss [41.17536985461902]
We prove an oracle complexity lower bound scaling as $Omega(Nepsilon-2/3)$, showing that our dependence on $N$ is optimal up to polylogarithmic factors. We develop methods with improved complexity bounds of $tildeO(Nepsilon-2/3 + sqrtNepsilon-8/3)$ in the non-smooth case and $tildeO(Nepsilon-2/3 + sqrtNepsilon-1)$ in
arXiv Detail & Related papers (2021-05-04T21:49:15Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.