Cross-Gradient Aggregation for Decentralized Learning from Non-IID data
- URL: http://arxiv.org/abs/2103.02051v1
- Date: Tue, 2 Mar 2021 21:58:12 GMT
- Title: Cross-Gradient Aggregation for Decentralized Learning from Non-IID data
- Authors: Yasaman Esfandiari, Sin Yong Tan, Zhanhong Jiang, Aditya Balu, Ethan
Herron, Chinmay Hegde, Soumik Sarkar
- Abstract summary: Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server.
We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm.
We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
- Score: 34.23789472226752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized learning enables a group of collaborative agents to learn
models using a distributed dataset without the need for a central parameter
server. Recently, decentralized learning algorithms have demonstrated
state-of-the-art results on benchmark data sets, comparable with centralized
algorithms. However, the key assumption to achieve competitive performance is
that the data is independently and identically distributed (IID) among the
agents which, in real-life applications, is often not applicable. Inspired by
ideas from continual learning, we propose Cross-Gradient Aggregation (CGA), a
novel decentralized learning algorithm where (i) each agent aggregates
cross-gradient information, i.e., derivatives of its model with respect to its
neighbors' datasets, and (ii) updates its model using a projected gradient
based on quadratic programming (QP). We theoretically analyze the convergence
characteristics of CGA and demonstrate its efficiency on non-IID data
distributions sampled from the MNIST and CIFAR-10 datasets. Our empirical
comparisons show superior learning performance of CGA over existing
state-of-the-art decentralized learning algorithms, as well as maintaining the
improved performance under information compression to reduce peer-to-peer
communication overhead.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models [21.85879890198875]
Decentralized Iterative Merging-And-Training (DIMAT) is a novel decentralized deep learning algorithm.
We show that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead.
This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse lightweight communication computation.
arXiv Detail & Related papers (2024-04-11T18:34:29Z) - Deep Contrastive Graph Learning with Clustering-Oriented Guidance [61.103996105756394]
Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering.
Models estimate an initial graph beforehand to apply GCN.
Deep Contrastive Graph Learning (DCGL) model is proposed for general data clustering.
arXiv Detail & Related papers (2024-02-25T07:03:37Z) - Cross-feature Contrastive Loss for Decentralized Deep Learning on
Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data.
Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent.
Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z) - Structured Cooperative Learning with Graphical Model Priors [98.53322192624594]
We study how to train personalized models for different tasks on decentralized devices with limited local data.
We propose "Structured Cooperative Learning (SCooL)", in which a cooperation graph across devices is generated by a graphical model.
We evaluate SCooL and compare it with existing decentralized learning methods on an extensive set of benchmarks.
arXiv Detail & Related papers (2023-06-16T02:41:31Z) - Global Update Tracking: A Decentralized Learning Algorithm for
Heterogeneous Data [14.386062807300666]
In this paper, we focus on designing a decentralized learning algorithm that is less susceptible to variations in data distribution across devices.
We propose Global Update Tracking (GUT), a novel tracking-based method that aims to mitigate the impact of heterogeneous data in decentralized learning without introducing any communication overhead.
Our experiments show that the proposed method achieves state-of-the-art performance for decentralized learning on heterogeneous data via a $1-6%$ improvement in test accuracy compared to other existing techniques.
arXiv Detail & Related papers (2023-05-08T15:48:53Z) - Neighborhood Gradient Clustering: An Efficient Decentralized Learning
Method for Non-IID Data Distributions [5.340730281227837]
The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed.
We propose textitNeighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information.
arXiv Detail & Related papers (2022-09-28T19:28:54Z) - Federated XGBoost on Sample-Wise Non-IID Data [8.49189353769386]
Decision tree-based models, in particular XGBoost, can handle non-IID data.
This paper investigates the effects of how Federated XGBoost is impacted by non-IID distributions.
arXiv Detail & Related papers (2022-09-03T06:14:20Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.