Related papers: Quantized Decentralized Stochastic Learning over Directed Graphs

Quantized Decentralized Stochastic Learning over Directed Graphs

URL: http://arxiv.org/abs/2002.09964v5
Date: Mon, 28 Dec 2020 10:02:25 GMT
Title: Quantized Decentralized Stochastic Learning over Directed Graphs
Authors: Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani
Abstract summary: We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors. We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
Score: 52.94011236627326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exact-communication for both convex and non-convex losses. Numerical evaluations corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods.

Related papers

Communication-Efficient Stochastic Distributed Learning [3.2923780772605595]
We address distributed learning problems, both non and convex, undirected networks. In particular, we design a novel based on the distributed Alternating Method of Multipliers (MM) to address the challenges of high communication costs.
arXiv Detail & Related papers (2025-01-23T10:05:23Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Optimal Complexity in Decentralized Training [45.468216452357375]
We present a gossip-style decentralized algorithm that achieves the lower bound with only a gap. We show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
arXiv Detail & Related papers (2020-06-15T02:03:04Z)
Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization. In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient. Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z)
Gradient tracking and variance reduction for decentralized optimization and machine learning [19.54092620537586]
Decentralized methods to solve finite-sum problems are important in many signal processing and machine learning tasks. We provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve robust performance.
arXiv Detail & Related papers (2020-02-13T07:17:07Z)
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond [73.03743482037378]
Distributed learning has become a critical direction of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time data computation problems. Practical issues and future research will also be discussed.
arXiv Detail & Related papers (2020-01-14T14:11:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.