Related papers: Towards Tight Communication Lower Bounds for Distributed Optimisation

Towards Tight Communication Lower Bounds for Distributed Optimisation

URL: http://arxiv.org/abs/2010.08222v3
Date: Tue, 7 Dec 2021 13:49:23 GMT
Title: Towards Tight Communication Lower Bounds for Distributed Optimisation
Authors: Dan Alistarh and Janne H. Korhonen
Abstract summary: We consider a standard distributed optimisation setting where $N$ machines aim to jointly minimise the sum of the functions $sum_i = 1N f_i (x)$. Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines. We show that $Omega( Nd log d / Nvarepsilon)$ total bits need to be communicated between the machines to find an additive $epsilon$-approximation to the minimum of $sum_i = 1N
Score: 30.134447658488057
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a standard distributed optimisation setting where $N$ machines, each holding a $d$-dimensional function $f_i$, aim to jointly minimise the sum of the functions $\sum_{i = 1}^N f_i (x)$. This problem arises naturally in large-scale distributed optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that $\Omega( Nd \log d / N\varepsilon)$ total bits need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$. The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values, and is matched within constant factors for quadratic objectives by a new variant of quantised gradient descent, which we describe and analyse. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.

Related papers

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems. We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm. Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z)
Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point. However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications. In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z)
An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization. We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z)
DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication. We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z)
No quantum speedup over gradient descent for non-smooth convex optimization [22.16973542453584]
Black-box access to a (not necessarily smooth) function $f:mathbbRn to mathbbR$ and its (sub)gradient. Our goal is to find an $epsilon$-approximate minimum of $f$ starting from a point that is distance at most $R$ from the true minimum. We show that although the function family used in the lower bound is hard for randomized algorithms, it can be solved using $O(GR/epsilon)$ quantum queries.
arXiv Detail & Related papers (2020-10-05T06:32:47Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.