Quantized Decentralized Stochastic Learning over Directed Graphs
- URL: http://arxiv.org/abs/2002.09964v5
- Date: Mon, 28 Dec 2020 10:02:25 GMT
- Title: Quantized Decentralized Stochastic Learning over Directed Graphs
- Authors: Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani
- Abstract summary: We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
- Score: 52.94011236627326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a decentralized stochastic learning problem where data points are
distributed among computing nodes communicating over a directed graph. As the
model size gets large, decentralized learning faces a major bottleneck that is
the heavy communication load due to each node transmitting large messages
(model updates) to its neighbors. To tackle this bottleneck, we propose the
quantized decentralized stochastic learning algorithm over directed graphs that
is based on the push-sum algorithm in decentralized consensus optimization.
More importantly, we prove that our algorithm achieves the same convergence
rates of the decentralized stochastic learning algorithm with
exact-communication for both convex and non-convex losses. Numerical
evaluations corroborate our main theoretical results and illustrate significant
speed-up compared to the exact-communication methods.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Optimal Complexity in Decentralized Training [45.468216452357375]
We present a gossip-style decentralized algorithm that achieves the lower bound with only a gap.
We show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
arXiv Detail & Related papers (2020-06-15T02:03:04Z) - Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization.
In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient.
Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z) - Gradient tracking and variance reduction for decentralized optimization
and machine learning [19.54092620537586]
Decentralized methods to solve finite-sum problems are important in many signal processing and machine learning tasks.
We provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve robust performance.
arXiv Detail & Related papers (2020-02-13T07:17:07Z) - Distributed Learning in the Non-Convex World: From Batch to Streaming
Data, and Beyond [73.03743482037378]
Distributed learning has become a critical direction of the massively connected world envisioned by many.
This article discusses four key elements of scalable distributed processing and real-time data computation problems.
Practical issues and future research will also be discussed.
arXiv Detail & Related papers (2020-01-14T14:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.