Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning
- URL: http://arxiv.org/abs/2512.20363v1
- Date: Tue, 23 Dec 2025 13:46:38 GMT
- Title: Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning
- Authors: Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti,
- Abstract summary: Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices.<n>We propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data.<n>Clust-PSI-PFL delivers up to 18% higher global accuracy than state-of-the-art baselines and markedly improves client fairness by a relative improvement of 37% under severe non-IID data.
- Score: 1.8257614612363051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distributed (non-IID) data across clients biases updates and degrades performance. To alleviate these issues, we propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data. We compute a weighted PSI metric, $WPSI^L$, which we show to be more informative than common non-IID metrics (Hellinger, Jensen-Shannon, and Earth Mover's distance). Using PSI features, we form distributionally homogeneous groups of clients via K-means++; the number of optimal clusters is chosen by a systematic silhouette-based procedure, typically yielding few clusters with modest overhead. Across six datasets (tabular, image, and text modalities), two partition protocols (Dirichlet with parameter $α$ and Similarity with parameter S), and multiple client sizes, Clust-PSI-PFL delivers up to 18% higher global accuracy than state-of-the-art baselines and markedly improves client fairness by a relative improvement of 37% under severe non-IID data. These results establish PSI-guided clustering as a principled, lightweight mechanism for robust PFL under label skew.
Related papers
- CO-PFL: Contribution-Oriented Personalized Federated Learning for Heterogeneous Networks [51.43780477302533]
Contribution-Oriented PFL (CO-PFL) is a novel algorithm that dynamically estimates each client's contribution for global aggregation.<n>CO-PFL consistently surpasses state-of-the-art methods in robustness in personalization accuracy, robustness, scalability and convergence stability.
arXiv Detail & Related papers (2025-10-23T05:10:06Z) - DPMM-CFL: Clustered Federated Learning via Dirichlet Process Mixture Model Nonparametric Clustering [8.645893961456801]
Clustered Federated Learning improves performance under non-IID client heterogeneity.<n>Most CFL methods require the number of clusters K to be fixed a priori.<n>We propose DPMM-CFL, a CFL algorithm that places a Dirichlet Process (DP) prior over the distribution of cluster parameters.
arXiv Detail & Related papers (2025-10-08T15:27:08Z) - PSI-PFL: Population Stability Index for Client Selection in non-IID Personalized Federated Learning [1.8777876049719082]
Federated Learning (FL) enables decentralized machine learning (ML) model training while preserving data privacy by keeping data localized across clients.<n>We propose PSI-PFL, a novel client selection framework for Personalized Federated Learning (PFL)<n>Our approach selects more homogeneous clients based on PSI, reducing the impact of label skew, one of the most detrimental factors in FL performance.
arXiv Detail & Related papers (2025-05-31T07:41:42Z) - Privacy Protection in Prosumer Energy Management Based on Federated Learning [0.6963971634605796]
prosumers' information can efficiently participate in the intelligent decision making of the system without revealing privacy.<n>The accuracy of the model in the case of Non-IID is improved through the method of clustering and parameter weighted average.<n>Local multiple iterations and three-tier framework can effectively reduce communication rounds.
arXiv Detail & Related papers (2025-03-09T05:29:29Z) - FedAPA: Server-side Gradient-Based Adaptive Personalized Aggregation for Federated Learning on Heterogeneous Data [5.906966694759679]
FedAPA is a novel PFL method featuring a server-side, gradient-based adaptive aggregation strategy to generate personalized models.<n>FedAPA guarantees theoretical convergence and achieves superior accuracy and computational efficiency compared to 10 PFL competitors across three datasets.
arXiv Detail & Related papers (2025-02-11T11:00:58Z) - Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.<n>We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z) - TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering [0.0]
We propose a novel approach called Tsetlin-Personalized Federated Learning.
In this way, models are grouped into clusters based on their confidence towards a specific class.
Clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation.
Results demonstrated that TPFL performance better than baseline methods with 98.94% accuracy on MNIST, 98.52% accuracy on FashionMNIST and 91.16% accuracy on FEMNIST dataset.
arXiv Detail & Related papers (2024-09-16T15:27:35Z) - Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training.
In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework.
Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Optimizing Server-side Aggregation For Robust Federated Learning via
Subspace Training [80.03567604524268]
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems.
We propose SmartFL, a generic approach that optimize the server-side aggregation process.
We provide theoretical analyses of the convergence and generalization capacity for SmartFL.
arXiv Detail & Related papers (2022-11-10T13:20:56Z) - Efficient Distribution Similarity Identification in Clustered Federated
Learning via Principal Angles Between Client Data Subspaces [59.33965805898736]
Clustered learning has been shown to produce promising results by grouping clients into clusters.
Existing FL algorithms are essentially trying to group clients together with similar distributions.
Prior FL algorithms attempt similarities indirectly during training.
arXiv Detail & Related papers (2022-09-21T17:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.