Related papers: Diversity-Driven Learning: Tackling Spurious Correlations and Data Heterogeneity in Federated Models

Diversity-Driven Learning: Tackling Spurious Correlations and Data Heterogeneity in Federated Models

URL: http://arxiv.org/abs/2504.11216v1
Date: Tue, 15 Apr 2025 14:20:42 GMT
Title: Diversity-Driven Learning: Tackling Spurious Correlations and Data Heterogeneity in Federated Models
Authors: Gergely D. Németh, Eros Fanì, Yeat Jeng Ng, Barbara Caputo, Miguel Ángel Lozano, Nuria Oliver, Novi Quadrianto,
Abstract summary: Federated Learning (FL) enables decentralized training of machine learning models on distributed data.<n>In real-world FL settings, client data is often non-identically distributed and imbalanced.<n>We propose FedDiverse, a novel client selection algorithm in FL which is designed to manage and leverage data heterogeneity.
Score: 21.672445835824053
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated Learning (FL) enables decentralized training of machine learning models on distributed data while preserving privacy. However, in real-world FL settings, client data is often non-identically distributed and imbalanced, resulting in statistical data heterogeneity which impacts the generalization capabilities of the server's model across clients, slows convergence and reduces performance. In this paper, we address this challenge by first proposing a characterization of statistical data heterogeneity by means of 6 metrics of global and client attribute imbalance, class imbalance, and spurious correlations. Next, we create and share 7 computer vision datasets for binary and multiclass image classification tasks in Federated Learning that cover a broad range of statistical data heterogeneity and hence simulate real-world situations. Finally, we propose FedDiverse, a novel client selection algorithm in FL which is designed to manage and leverage data heterogeneity across clients by promoting collaboration between clients with complementary data distributions. Experiments on the seven proposed FL datasets demonstrate FedDiverse's effectiveness in enhancing the performance and robustness of a variety of FL methods while having low communication and computational overhead.

Related papers

Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices [10.51330114955586]
Federated Learning (FL) enables edge devices to collaboratively learn a global model.<n>We propose a dynamic clustering algorithm for personalized federated learning system (DC-PFL)<n>We show that DC-PFL significantly reduces total training time and improves model accuracy compared to baselines.
arXiv Detail & Related papers (2025-08-03T04:19:22Z)
Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients [52.14230635007546]
Foundation models have shown remarkable capabilities across diverse multi-modal tasks, but their centralized training raises privacy concerns and induces high transmission costs.<n>For the growing demand for personalizing AI models for different user purposes, personalized federated learning (PFL) has emerged.<n>PFL allows each client to leverage the knowledge of other clients for further adaptation to individual user preferences, again without the need to share data.
arXiv Detail & Related papers (2025-05-20T09:17:07Z)
Enhancing Federated Learning Through Secure Cluster-Weighted Client Aggregation [4.869042695112397]
Federated learning (FL) has emerged as a promising paradigm in machine learning.<n>In FL, a global model is trained iteratively on local datasets residing on individual devices.<n>This paper introduces a novel FL framework, ClusterGuardFL, that employs dissimilarity scores, k-means clustering, and reconciliation confidence scores to dynamically assign weights to client updates.
arXiv Detail & Related papers (2025-03-29T04:29:24Z)
Asynchronous Personalized Federated Learning through Global Memorization [16.630360485032163]
Federated Learning offers a privacy preserving solution by enabling collaborative model training across decentralized devices without centralizing sensitive data.<n>We propose the Asynchronous Personalized Federated Learning framework, which empowers clients to develop personalized models using a server side semantic generator.<n>This generator, trained via data free knowledge transfer under global model supervision, enhances client data diversity by producing both seen and unseen samples.<n>To counter the risks of synthetic data impairing training, we introduce a decoupled model method, ensuring robust personalization.
arXiv Detail & Related papers (2025-03-01T09:00:33Z)
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets. Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round. We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z)
FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm. It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities. It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z)
Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning [16.684749528240587]
Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data. FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness. We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while keeping inference costs low.
arXiv Detail & Related papers (2023-11-14T19:10:56Z)
PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning [55.930403371398114]
We propose a novel generative adversarial network (GAN) sharing and aggregation strategy for personalized learning (PFL) PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
arXiv Detail & Related papers (2023-08-23T22:38:35Z)
Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training. In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework. Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z)
Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z)
Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks. We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.