Related papers: CoDeC: Communication-Efficient Decentralized Continual Learning

CoDeC: Communication-Efficient Decentralized Continual Learning

URL: http://arxiv.org/abs/2303.15378v1
Date: Mon, 27 Mar 2023 16:52:17 GMT
Title: CoDeC: Communication-Efficient Decentralized Continual Learning
Authors: Sakshi Choudhary, Sai Aparna Aketi, Gobinda Saha and Kaushik Roy
Abstract summary: Training at the edge utilizes continuously evolving data generated at different locations. Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data. We propose CoDeC, a novel communication-efficient decentralized continual learning algorithm.
Score: 6.663641564969944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training at the edge utilizes continuously evolving data generated at different locations. Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data, deeming it crucial to design training algorithms that enable efficient continual learning over decentralized private data. Decentralized learning allows serverless training with spatially distributed data. A fundamental barrier in such distributed learning is the high bandwidth cost of communicating model updates between agents. Moreover, existing works under this training paradigm are not inherently suitable for learning a temporal sequence of tasks while retaining the previously acquired knowledge. In this work, we propose CoDeC, a novel communication-efficient decentralized continual learning algorithm which addresses these challenges. We mitigate catastrophic forgetting while learning a task sequence in a decentralized learning setup by combining orthogonal gradient projection with gossip averaging across decentralized agents. Further, CoDeC includes a novel lossless communication compression scheme based on the gradient subspaces. We express layer-wise gradients as a linear combination of the basis vectors of these gradient subspaces and communicate the associated coefficients. We theoretically analyze the convergence rate for our algorithm and demonstrate through an extensive set of experiments that CoDeC successfully learns distributed continual tasks with minimal forgetting. The proposed compression scheme results in up to 4.8x reduction in communication costs with iso-performance as the full communication baseline.

Related papers

DRACO: Decentralized Asynchronous Federated Learning over Continuous Row-Stochastic Network Matrices [7.389425875982468]
We propose DRACO, a novel method for decentralized asynchronous Descent (SGD) over row-stochastic gossip wireless networks. Our approach enables edge devices within decentralized networks to perform local training and model exchanging along a continuous timeline. Our numerical experiments corroborate the efficacy of the proposed technique.
arXiv Detail & Related papers (2024-06-19T13:17:28Z)
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression [56.01900711954956]
We introduce LoCoDL, a communication-efficient algorithm that leverages the two popular and effective techniques of Local training, which reduces the communication frequency, and Compression, in which short bitstreams are sent instead of full-dimensional vectors of floats. LoCoDL provably benefits from local training and compression and enjoys a doubly-accelerated communication complexity, with respect to the condition number of the functions and the model dimension, in the general heterogenous regime with strongly convex functions.
arXiv Detail & Related papers (2024-03-07T09:22:50Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
Online Distributed Learning with Quantized Finite-Time Coordination [0.4910937238451484]
In our setting a set of agents need to cooperatively train a learning model from streaming data. We propose a distributed algorithm that relies on a quantized, finite-time coordination protocol. We analyze the performance of the proposed algorithm in terms of the mean distance from the online solution.
arXiv Detail & Related papers (2023-07-13T08:36:15Z)
Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD. The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z)
On Generalization of Decentralized Learning with Separable Data [37.908159361149835]
We study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent and a variety of loss functions that asymptote to zero at infinity, we derive novel finite-time generalization bounds.
arXiv Detail & Related papers (2022-09-15T07:59:05Z)
QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM [30.795725108364724]
This paper focuses on online kernel learning over a decentralized network. We propose a novel learning framework named Online Decentralized Kernel learning via Linearized ADMM.
arXiv Detail & Related papers (2022-08-04T17:16:27Z)
RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z)
Sparse-Push: Communication- & Energy-Efficient Decentralized Distributed Learning over Directed & Time-Varying Graphs with non-IID Datasets [2.518955020930418]
We propose Sparse-Push, a communication efficient decentralized distributed training algorithm. The proposed algorithm enables 466x reduction in communication with only 1% degradation in performance. We demonstrate how communication compression can lead to significant performance degradation in-case of non-IID datasets.
arXiv Detail & Related papers (2021-02-10T19:41:11Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers. To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.