FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering
- URL: http://arxiv.org/abs/2407.07124v1
- Date: Tue, 9 Jul 2024 02:47:16 GMT
- Title: FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering
- Authors: Md Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng,
- Abstract summary: Federated learning (FL) is an emerging distributed machine learning paradigm.
One of the major challenges in FL is the presence of uneven data distributions across client devices.
We propose em FedClust, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients.
- Score: 26.478852701376294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. To address the performance degradation issue incurred by such data heterogeneity, clustered federated learning (CFL) shows its promise by grouping clients into separate learning clusters based on the similarity of their local data distributions. However, state-of-the-art CFL approaches require a large number of communication rounds to learn the distribution similarities during training until the formation of clusters is stabilized. Moreover, some of these algorithms heavily rely on a predefined number of clusters, thus limiting their flexibility and adaptability. In this paper, we propose {\em FedClust}, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients. {\em FedClust} groups clients into clusters in a one-shot manner by measuring the similarity degrees among clients based on the strategically selected partial weights of locally trained models. We conduct extensive experiments on four benchmark datasets with different non-IID data settings. Experimental results demonstrate that {\em FedClust} achieves higher model accuracy up to $\sim$45\% as well as faster convergence with a significantly reduced communication cost up to 2.7$\times$ compared to its state-of-the-art counterparts.
Related papers
- Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions [1.6385815610837167]
Federated Cluster-Wise Refinement (FedCRef) involves clients that collaboratively train models on clusters with similar data distributions.
In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy.
This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each.
arXiv Detail & Related papers (2024-08-20T09:05:44Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - FedClust: Optimizing Federated Learning on Non-IID Data through
Weight-Driven Client Clustering [28.057411252785176]
Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data.
This paper proposes FedClust, a novel CFL approach leveraging correlations between local model weights and client data distributions.
arXiv Detail & Related papers (2024-03-07T01:50:36Z) - Contrastive encoder pre-training-based clustered federated learning for
heterogeneous data [17.580390632874046]
Federated learning (FL) enables distributed clients to collaboratively train a global model while preserving their data privacy.
We propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems.
arXiv Detail & Related papers (2023-11-28T05:44:26Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Stochastic Clustered Federated Learning [21.811496586350653]
This paper proposes StoCFL, a novel clustered federated learning approach for generic Non-IID issues.
In detail, StoCFL implements a flexible CFL framework that supports an arbitrary proportion of client participation and newly joined clients.
The results show that StoCFL could obtain promising cluster results even when the number of clusters is unknown.
arXiv Detail & Related papers (2023-03-02T01:39:16Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Efficient Distribution Similarity Identification in Clustered Federated
Learning via Principal Angles Between Client Data Subspaces [59.33965805898736]
Clustered learning has been shown to produce promising results by grouping clients into clusters.
Existing FL algorithms are essentially trying to group clients together with similar distributions.
Prior FL algorithms attempt similarities indirectly during training.
arXiv Detail & Related papers (2022-09-21T17:37:54Z) - FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling
and Correction [48.85303253333453]
Federated learning (FL) allows multiple clients to collectively train a high-performance global model without sharing their private data.
We propose a novel federated learning algorithm with local drift decoupling and correction (FedDC)
Our FedDC only introduces lightweight modifications in the local training phase, in which each client utilizes an auxiliary local drift variable to track the gap between the local model parameter and the global model parameters.
Experiment results and analysis demonstrate that FedDC yields expediting convergence and better performance on various image classification tasks.
arXiv Detail & Related papers (2022-03-22T14:06:26Z) - Robust Convergence in Federated Learning through Label-wise Clustering [6.693651193181458]
Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL)
We propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically heterogeneous local clients.
Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms.
arXiv Detail & Related papers (2021-12-28T18:13:09Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.