Moniqua: Modulo Quantized Communication in Decentralized SGD
- URL: http://arxiv.org/abs/2002.11787v3
- Date: Tue, 30 Jun 2020 04:12:51 GMT
- Title: Moniqua: Modulo Quantized Communication in Decentralized SGD
- Authors: Yucheng Lu and Christopher De Sa
- Abstract summary: Moniqua is a technique that allows decentralized algorithms to use quantized communication.
We show that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms.
- Score: 45.468216452357375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Running Stochastic Gradient Descent (SGD) in a decentralized fashion has
shown promising results. In this paper we propose Moniqua, a technique that
allows decentralized SGD to use quantized communication. We prove in theory
that Moniqua communicates a provably bounded number of bits per iteration,
while converging at the same asymptotic rate as the original algorithm does
with full-precision communication. Moniqua improves upon prior works in that it
(1) requires zero additional memory, (2) works with 1-bit quantization, and (3)
is applicable to a variety of decentralized algorithms. We demonstrate
empirically that Moniqua converges faster with respect to wall clock time than
other quantized decentralized algorithms. We also show that Moniqua is robust
to very low bit-budgets, allowing 1-bit-per-parameter communication without
compromising validation accuracy when training ResNet20 and ResNet110 on
CIFAR10.
Related papers
- Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates [4.3707341422218215]
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning.
We design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST.
We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20.
arXiv Detail & Related papers (2023-07-14T22:58:20Z) - $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in
Decentralized Deep Learning [0.0]
We introduce a principled asynchronous, randomized, gossip-based optimization algorithm which works thanks to a continuous local momentum named $textbfA2textbfCiD2$.
Our theoretical analysis proves accelerated rates compared to previous asynchronous decentralized baselines.
We show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers.
arXiv Detail & Related papers (2023-06-14T06:52:07Z) - Communication-Efficient Topologies for Decentralized Learning with
$O(1)$ Consensus Rate [35.698182247676414]
Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server.
We show that the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is mixed'' by communication.
We propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate.
arXiv Detail & Related papers (2022-10-14T15:02:01Z) - QuTE: decentralized multiple testing on sensor networks with false
discovery rate control [130.7122910646076]
This paper designs methods for decentralized multiple hypothesis testing on graphs equipped with provable guarantees on the false discovery rate (FDR)
We consider the setting where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-values corresponding to one or more hypotheses local to its node.
Each agent must individually decide whether to reject one or more of its local hypotheses by only communicating with its neighbors, with the joint aim that the global FDR over the entire graph must be controlled at a predefined level.
arXiv Detail & Related papers (2022-10-09T19:48:39Z) - Maximizing Communication Efficiency for Large-scale Training via 0/1
Adam [49.426602335460295]
1-bit communication is an effective method to scale up model training, and has been studied extensively on SGD.
We propose 0/1 Adam, which improves upon the state-of-the-art 1-bit Adam via two novel methods.
arXiv Detail & Related papers (2022-02-12T08:02:23Z) - Sample and Communication-Efficient Decentralized Actor-Critic Algorithms
with Finite-Time Analysis [27.21581944906418]
Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems.
We develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient.
arXiv Detail & Related papers (2021-09-08T15:02:21Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.