Distributed Learning and Democratic Embeddings: Polynomial-Time Source
Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient
Descent under Communication Constraints
- URL: http://arxiv.org/abs/2103.07578v1
- Date: Sat, 13 Mar 2021 00:04:11 GMT
- Title: Distributed Learning and Democratic Embeddings: Polynomial-Time Source
Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient
Descent under Communication Constraints
- Authors: Rajarshi Saha, Mert Pilanci, Andrea J. Goldsmith
- Abstract summary: We consider the problem of compressing a vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits per dimension.
We show that Democratic and Near-Democratic source-coding schemes are (near) optimal in the sense that the covering efficiency of the resulting quantizer is either dimension independent, or has a very weak logarithmic dependence.
We propose a distributed optimization algorithm: DGD-DEF, which employs our proposed coding strategy, and achieves the minimax optimal convergence rate to within (near) constant factors.
- Score: 46.17631511884969
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we consider the distributed optimization setting where
information exchange between the computation nodes and the parameter server is
subject to a maximum bit-budget. We first consider the problem of compressing a
vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits
per dimension, for which we introduce Democratic and Near-Democratic
source-coding schemes. We show that these coding schemes are (near) optimal in
the sense that the covering efficiency of the resulting quantizer is either
dimension independent, or has a very weak logarithmic dependence. Subsequently,
we propose a distributed optimization algorithm: DGD-DEF, which employs our
proposed coding strategy, and achieves the minimax optimal convergence rate to
within (near) constant factors for a class of communication-constrained
distributed optimization algorithms. Furthermore, we extend the utility of our
proposed source coding scheme by showing that it can remarkably improve the
performance when used in conjunction with other compression schemes. We
validate our theoretical claims through numerical simulations.
Keywords: Fast democratic (Kashin) embeddings, Distributed optimization,
Data-rate constraint, Quantized gradient descent, Error feedback.
Related papers
- Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources.
We characterize the optimal parametric solutions.
We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z) - Sparsity-Constraint Optimization via Splicing Iteration [1.3622424109977902]
We develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE)
SCOPE converges effectively without tuning parameters.
We apply SCOPE to solve quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables.
Our open-source Python package skscope based on C++ implementation is publicly available on GitHub.
arXiv Detail & Related papers (2024-06-17T18:34:51Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints [9.514904359788156]
We study semantic-aware communication for remote estimation of Markov sources over a lossy and rate-constrained channel.
We find an optimal scheduling policy that minimizes the long-term state-dependent costs of estimation errors under a transmission frequency constraint.
arXiv Detail & Related papers (2024-03-25T15:18:23Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.