Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling
- URL: http://arxiv.org/abs/2504.13462v1
- Date: Fri, 18 Apr 2025 04:44:41 GMT
- Title: Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling
- Authors: Hui Yeok Wong, Chee Kau Lim, Chee Seng Chan,
- Abstract summary: Stratify is a novel FL framework designed to systematically manage class and feature distributions throughout training.<n>Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels.<n>To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption.
- Score: 9.774529150331297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance significantly deteriorates under highly heterogeneous conditions, as the fundamental issue of imbalanced exposure to diverse class and feature distributions remains unresolved. This paper introduces Stratify, a novel FL framework designed to systematically manage class and feature distributions throughout training, effectively tackling the root cause of non-IID challenges. Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels, significantly reducing bias and variance in aggregated gradients. Complementing SLS, we propose a label-aware client selection strategy, restricting participation exclusively to clients possessing data relevant to scheduled labels. Additionally, Stratify incorporates a fine-grained, high-frequency update scheme, accelerating convergence and further mitigating data heterogeneity. To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption, enabling precise global label statistics without disclosing sensitive client information. Extensive evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, COVTYPE, PACS, and Digits-DG demonstrate that Stratify attains performance comparable to IID baselines, accelerates convergence, and reduces client-side computation compared to state-of-the-art methods, underscoring its practical effectiveness in realistic federated learning scenarios.
Related papers
- Privacy Preserving and Robust Aggregation for Cross-Silo Federated Learning in Non-IID Settings [1.8434042562191815]
Federated Averaging remains the most widely used aggregation strategy in federated learning.<n>Our method relies solely on gradient updates, eliminating the need for any additional client metadata.<n>Our results establish the effectiveness of gradient masking as a practical and secure solution for federated learning.
arXiv Detail & Related papers (2025-03-06T14:06:20Z) - Propensity-driven Uncertainty Learning for Sample Exploration in Source-Free Active Domain Adaptation [19.620523416385346]
Source-free active domain adaptation (SFADA) addresses the challenge of adapting a pre-trained model to new domains without access to source data.<n>This scenario is particularly relevant in real-world applications where data privacy, storage limitations, or labeling costs are significant concerns.<n>We propose the Propensity-driven Uncertainty Learning (ProULearn) framework to effectively select more informative samples without frequently requesting human annotations.
arXiv Detail & Related papers (2025-01-23T10:05:25Z) - Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation [1.519321208145928]
Federated Learning (FL) offers a promising framework for individuals aiming to collaboratively develop a shared model.
variations in data distribution among clients can profoundly affect FL methodologies, primarily due to instabilities in the aggregation process.
We propose a novel FL framework, combining individual parameter pruning and regularization techniques to improve the robustness of individual clients' models to aggregate.
arXiv Detail & Related papers (2024-12-19T16:22:37Z) - A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision.
Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL.
We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z) - FedAnchor: Enhancing Federated Semi-Supervised Learning with Label
Contrastive Loss for Unlabeled Clients [19.3885479917635]
Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices.
We propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server.
Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples.
arXiv Detail & Related papers (2024-02-15T18:48:21Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Exploiting Low-confidence Pseudo-labels for Source-free Object Detection [54.98300313452037]
Source-free object detection (SFOD) aims to adapt a source-trained detector to an unlabeled target domain without access to the labeled source data.
Current SFOD methods utilize a threshold-based pseudo-label approach in the adaptation phase.
We propose a new approach to take full advantage of pseudo-labels by introducing high and low confidence thresholds.
arXiv Detail & Related papers (2023-10-19T12:59:55Z) - Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer [60.31021888394358]
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR)
We propose a SOurce-free Domain Adaptation framework for image SR (SODA-SR) to address this issue, i.e., adapt a source-trained model to a target domain with only unlabeled target data.
arXiv Detail & Related papers (2023-03-31T03:14:44Z) - FedCC: Robust Federated Learning against Model Poisoning Attacks [0.0]
Federated learning is a distributed framework designed to address privacy concerns.<n>It introduces new attack surfaces, which are especially prone when data is non-Independently and Identically Distributed.<n>We present FedCC, a simple yet effective novel defense algorithm against model poisoning attacks.
arXiv Detail & Related papers (2022-12-05T01:52:32Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Uncertainty Minimization for Personalized Federated Semi-Supervised
Learning [15.123493340717303]
We propose a novel semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents)
Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data.
arXiv Detail & Related papers (2022-05-05T04:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.