Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks
- URL: http://arxiv.org/abs/2209.15595v1
- Date: Fri, 30 Sep 2022 17:15:19 GMT
- Title: Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks
- Authors: Mahdi Morafah, Saeed Vahidian, Chen Chen, Mubarak Shah, Bill Lin
- Abstract summary: We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
- Score: 65.34113135080105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though successful, federated learning presents new challenges for machine
learning, especially when the issue of data heterogeneity, also known as
Non-IID data, arises. To cope with the statistical heterogeneity, previous
works incorporated a proximal term in local optimization or modified the model
aggregation scheme at the server side or advocated clustered federated learning
approaches where the central server groups agent population into clusters with
jointly trainable data distributions to take the advantage of a certain level
of personalization. While effective, they lack a deep elaboration on what kind
of data heterogeneity and how the data heterogeneity impacts the accuracy
performance of the participating clients. In contrast to many of the prior
federated learning approaches, we demonstrate not only the issue of data
heterogeneity in current setups is not necessarily a problem but also in fact
it can be beneficial for the FL participants. Our observations are intuitive:
(1) Dissimilar labels of clients (label skew) are not necessarily considered
data heterogeneity, and (2) the principal angle between the agents' data
subspaces spanned by their corresponding principal vectors of data is a better
estimate of the data heterogeneity. Our code is available at
https://github.com/MMorafah/FL-SC-NIID.
Related papers
- Dataset Distillation-based Hybrid Federated Learning on Non-IID Data [19.01147151081893]
We propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate independent and equally distributed (IID) data.
We partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced.
This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training.
arXiv Detail & Related papers (2024-09-26T03:52:41Z) - FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning [5.23984567704876]
Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning.
Traditional approach fails to address the phenomenon of class-wise bias in global long-tailed data.
New method FedLF introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation.
arXiv Detail & Related papers (2024-09-18T16:25:29Z) - FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm.
It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities.
It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Federated Visual Classification with Real-World Data Distribution [9.564468846277366]
We characterize the effect real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm.
We introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits.
We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.
arXiv Detail & Related papers (2020-03-18T07:55:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.