Related papers: DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates

DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates

URL: http://arxiv.org/abs/2307.07652v2
Date: Fri, 10 May 2024 23:28:59 GMT
Title: DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates
Authors: Peyman Gholami, Hulya Seferoglu,
Abstract summary: Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. We design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20.
Score: 4.3707341422218215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.

Related papers

Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks [32.914407967052114]
$texttBASS$ is a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD. We show that $texttBASS$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods.
arXiv Detail & Related papers (2024-01-24T20:00:23Z)
Asynchronous Local Computations in Distributed Bayesian Learning [8.516532665507835]
We propose gossip-based communication to leverage fast computations and reduce communication overhead simultaneously. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.
arXiv Detail & Related papers (2023-11-06T20:11:41Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks [8.860889476382594]
Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving.
arXiv Detail & Related papers (2022-11-01T15:37:54Z)
Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning [0.0]
Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization. We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
arXiv Detail & Related papers (2022-03-24T14:25:15Z)
A Newton-type algorithm for federated learning based on incremental Hessian eigenvector sharing [5.404315085380945]
We present an original communication-constrained Newton-type (NT) algorithm designed to accelerate Federated Learning (FL) The proposed solution is thoroughly validated on real datasets.
arXiv Detail & Related papers (2022-02-11T17:52:56Z)
AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization [159.75564904944707]
We propose an asynchronous quasi-Newton (AsySQN) framework for vertical federated learning (VFL) The proposed algorithms make descent steps scaled by approximate without calculating the inverse Hessian matrix explicitly. We show that the adopted asynchronous computation can make better use of the computation resource.
arXiv Detail & Related papers (2021-09-26T07:56:10Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant) We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.