Related papers: Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

URL: http://arxiv.org/abs/2209.14390v6
Date: Mon, 20 Mar 2023 20:05:33 GMT
Title: Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions
Authors: Sai Aparna Aketi, Sangamesh Kodge, Kaushik Roy
Abstract summary: The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. We propose textitNeighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information.
Score: 5.340730281227837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.

Related papers

Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity [40.64395367773766]
We propose a distributed strong lottery ticket hypothesis (TH), based on which a communication-efficient personalized learning algorithm is developed. In the proposed method, each local model is represented as the Hadamard product of global real-valued parameters and a personalized binary mask for pruning. We provide a theoretical proof for the DSLTH, establishing it as the foundation of the proposed method.
arXiv Detail & Related papers (2025-04-24T13:02:54Z)
Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients [40.84399531998246]
Federated Learning (FL) is a distributed machine learning framework in communication network systems. Non-Independent and Identically Distributed (Non-IID) data negatively affect the convergence efficiency of the global model. We propose the BHerd strategy which selects a beneficial herd of local gradients to accelerate the convergence of the FL model.
arXiv Detail & Related papers (2024-03-25T09:16:59Z)
Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data. Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z)
CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data. We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets. We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Robust Convergence in Federated Learning through Label-wise Clustering [6.693651193181458]
Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL) We propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically heterogeneous local clients. Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms.
arXiv Detail & Related papers (2021-12-28T18:13:09Z)
Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm. We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.