CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization
- URL: http://arxiv.org/abs/2302.10413v3
- Date: Sat, 15 Apr 2023 04:06:52 GMT
- Title: CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization
- Authors: Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, Thanh-Hung
Nguyen, Huy Hieu Pham, Truong Thao Nguyen, Phi Le Nguyen
- Abstract summary: Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
- Score: 3.3711670942444014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning enables edge devices to train a global model
collaboratively without exposing their data. Despite achieving outstanding
advantages in computing efficiency and privacy protection, federated learning
faces a significant challenge when dealing with non-IID data, i.e., data
generated by clients that are typically not independent and identically
distributed. In this paper, we tackle a new type of Non-IID data, called
cluster-skewed non-IID, discovered in actual data sets. The cluster-skewed
non-IID is a phenomenon in which clients can be grouped into clusters with
similar data distributions. By performing an in-depth analysis of the behavior
of a classification model's penultimate layer, we introduce a metric that
quantifies the similarity between two clients' data distributions without
violating their privacy. We then propose an aggregation scheme that guarantees
equality between clusters. In addition, we offer a novel local training
regularization based on the knowledge-distillation technique that reduces the
overfitting problem at clients and dramatically boosts the training scheme's
performance. We theoretically prove the superiority of the proposed aggregation
over the benchmark FedAvg. Extensive experimental results on both standard
public datasets and our in-house real-world dataset demonstrate that the
proposed approach improves accuracy by up to 16% compared to the FedAvg
algorithm.
Related papers
- Dataset Distillation-based Hybrid Federated Learning on Non-IID Data [19.01147151081893]
We propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate independent and equally distributed (IID) data.
We partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced.
This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training.
arXiv Detail & Related papers (2024-09-26T03:52:41Z) - Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions [1.6385815610837167]
Federated Cluster-Wise Refinement (FedCRef) involves clients that collaboratively train models on clusters with similar data distributions.
In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy.
This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each.
arXiv Detail & Related papers (2024-08-20T09:05:44Z) - Federated Two Stage Decoupling With Adaptive Personalization Layers [5.69361786082969]
Federated learning has gained significant attention due to its ability to enable distributed learning while maintaining privacy constraints.
It inherently experiences significant learning degradation and slow convergence speed.
It is natural to employ the concept of clustering homogeneous clients into the same group, allowing only the model weights within each group to be aggregated.
arXiv Detail & Related papers (2023-08-30T07:46:32Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Efficient Distribution Similarity Identification in Clustered Federated
Learning via Principal Angles Between Client Data Subspaces [59.33965805898736]
Clustered learning has been shown to produce promising results by grouping clients into clusters.
Existing FL algorithms are essentially trying to group clients together with similar distributions.
Prior FL algorithms attempt similarities indirectly during training.
arXiv Detail & Related papers (2022-09-21T17:37:54Z) - FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for
Non-IID Data in Federated Learning [4.02923738318937]
Uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning.
This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew.
We propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor.
arXiv Detail & Related papers (2022-08-04T04:24:16Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.