Neighborhood Gradient Clustering: An Efficient Decentralized Learning
Method for Non-IID Data Distributions
- URL: http://arxiv.org/abs/2209.14390v6
- Date: Mon, 20 Mar 2023 20:05:33 GMT
- Title: Neighborhood Gradient Clustering: An Efficient Decentralized Learning
Method for Non-IID Data Distributions
- Authors: Sai Aparna Aketi, Sangamesh Kodge, Kaushik Roy
- Abstract summary: The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed.
We propose textitNeighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information.
- Score: 5.340730281227837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized learning over distributed datasets can have significantly
different data distributions across the agents. The current state-of-the-art
decentralized algorithms mostly assume the data distributions to be Independent
and Identically Distributed. This paper focuses on improving decentralized
learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering
(NGC)}, a novel decentralized learning algorithm that modifies the local
gradients of each agent using self- and cross-gradient information.
Cross-gradients for a pair of neighboring agents are the derivatives of the
model parameters of an agent with respect to the dataset of the other agent. In
particular, the proposed method replaces the local gradients of the model with
the weighted mean of the self-gradients, model-variant cross-gradients
(derivatives of the neighbors' parameters with respect to the local dataset),
and data-variant cross-gradients (derivatives of the local model with respect
to its neighbors' datasets). The data-variant cross-gradients are aggregated
through an additional communication round without breaking the privacy
constraints. Further, we present \textit{CompNGC}, a compressed version of
\textit{NGC} that reduces the communication overhead by $32 \times$. We
theoretically analyze the convergence rate of the proposed algorithm and
demonstrate its efficiency over non-IID data sampled from {various vision and
language} datasets trained. Our experiments demonstrate that \textit{NGC} and
\textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized
learning algorithm over non-IID data with significantly less compute and memory
requirements. Further, our experiments show that the model-variant
cross-gradient information available locally at each agent can improve the
performance over non-IID data by $1-35\%$ without additional communication
cost.
Related papers
- Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients [40.84399531998246]
Federated Learning (FL) is a distributed machine learning framework in communication network systems.
Non-Independent and Identically Distributed (Non-IID) data negatively affect the convergence efficiency of the global model.
We propose the BHerd strategy which selects a beneficial herd of local gradients to accelerate the convergence of the FL model.
arXiv Detail & Related papers (2024-03-25T09:16:59Z) - Cross-feature Contrastive Loss for Decentralized Deep Learning on
Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data.
Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent.
Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Robust Convergence in Federated Learning through Label-wise Clustering [6.693651193181458]
Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL)
We propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically heterogeneous local clients.
Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms.
arXiv Detail & Related papers (2021-12-28T18:13:09Z) - Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server.
We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm.
We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.