Towards Tight Communication Lower Bounds for Distributed Optimisation
- URL: http://arxiv.org/abs/2010.08222v3
- Date: Tue, 7 Dec 2021 13:49:23 GMT
- Title: Towards Tight Communication Lower Bounds for Distributed Optimisation
- Authors: Dan Alistarh and Janne H. Korhonen
- Abstract summary: We consider a standard distributed optimisation setting where $N$ machines aim to jointly minimise the sum of the functions $sum_i = 1N f_i (x)$.
Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines.
We show that $Omega( Nd log d / Nvarepsilon)$ total bits need to be communicated between the machines to find an additive $epsilon$-approximation to the minimum of $sum_i = 1N
- Score: 30.134447658488057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a standard distributed optimisation setting where $N$ machines,
each holding a $d$-dimensional function $f_i$, aim to jointly minimise the sum
of the functions $\sum_{i = 1}^N f_i (x)$. This problem arises naturally in
large-scale distributed optimisation, where a standard solution is to apply
variants of (stochastic) gradient descent. We focus on the communication
complexity of this problem: our main result provides the first fully
unconditional bounds on total number of bits which need to be sent and received
by the $N$ machines to solve this problem under point-to-point communication,
within a given error-tolerance. Specifically, we show that $\Omega( Nd \log d /
N\varepsilon)$ total bits need to be communicated between the machines to find
an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i
(x)$. The result holds for both deterministic and randomised algorithms, and,
importantly, requires no assumptions on the algorithm structure. The lower
bound is tight under certain restrictions on parameter values, and is matched
within constant factors for quadratic objectives by a new variant of quantised
gradient descent, which we describe and analyse. Our results bring over tools
from communication complexity to distributed optimisation, which has potential
for further applications.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication.
We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.