You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data
- URL: http://arxiv.org/abs/2503.06916v1
- Date: Mon, 10 Mar 2025 04:57:20 GMT
- Title: You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data
- Authors: Shanshan Yan, Zexi Li, Chao Wu, Meng Pang, Yang Lu, Yan Yan, Hanzi Wang,
- Abstract summary: Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL)<n>We propose FedYoYo to improve representation learning by distilling knowledge between weakly and strongly augmented local samples.<n>We show FedYoYo achieves state-of-the-art results, even surpassing centralized logit adjustment methods by 5.4% under global long-tailed settings.
- Score: 32.24154235507531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL), leading to significant performance gaps compared to centralized learning. Previous research found that poor representations and biased classifiers are the main problems and proposed neural-collapse-inspired synthetic simplex ETF to help representations be closer to neural collapse optima. However, we find that the neural-collapse-inspired methods are not strong enough to reach neural collapse and still have huge gaps to centralized training. In this paper, we rethink this issue from a self-bootstrap perspective and propose FedYoYo (You Are Your Own Best Teacher), introducing Augmented Self-bootstrap Distillation (ASD) to improve representation learning by distilling knowledge between weakly and strongly augmented local samples, without needing extra datasets or models. We further introduce Distribution-aware Logit Adjustment (DLA) to balance the self-bootstrap process and correct biased feature representations. FedYoYo nearly eliminates the performance gap, achieving centralized-level performance even under mixed heterogeneity. It enhances local representation learning, reducing model drift and improving convergence, with feature prototypes closer to neural collapse optimality. Extensive experiments show FedYoYo achieves state-of-the-art results, even surpassing centralized logit adjustment methods by 5.4\% under global long-tailed settings.
Related papers
- Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data [17.624808621195978]
We propose a Selective Self-Distillation method for Federated learning (FedSSD)
FedSSD imposes adaptive constraints on the local updates by self-distilling the global model's knowledge.
It achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.
arXiv Detail & Related papers (2025-04-20T18:06:55Z) - NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel [27.92271597111756]
Decentralized federated learning (DFL) is a collaborative machine learning framework for training a model across participants without a central server or raw data exchange.
Recent work has shown that the neural tangent kernel (NTK) approach, when applied to federated learning in a centralized framework, can lead to improved performance.
We propose an approach leveraging the NTK to train client models in the decentralized setting, while introducing a synergy between NTK-based evolution and model averaging.
arXiv Detail & Related papers (2024-10-02T18:19:28Z) - FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning [5.23984567704876]
Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning.
Traditional approach fails to address the phenomenon of class-wise bias in global long-tailed data.
New method FedLF introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation.
arXiv Detail & Related papers (2024-09-18T16:25:29Z) - No Fear of Classifier Biases: Neural Collapse Inspired Federated
Learning with Synthetic and Fixed Classifier [10.491645205483051]
We propose a solution to the FL's classifier bias problem by utilizing a synthetic and fixed ETF classifier during training.
We devise several effective modules to better adapt the ETF structure in FL, achieving both high generalization and personalization.
Our method achieves state-of-the-art performances on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T15:38:39Z) - Stabilizing and Improving Federated Learning with Non-IID Data and
Client Dropout [15.569507252445144]
Label distribution skew induced data heterogeniety has been shown to be a significant obstacle that limits the model performance in federated learning.
We propose a simple yet effective framework by introducing a prior-calibrated softmax function for computing the cross-entropy loss.
The improved model performance over existing baselines in the presence of non-IID data and client dropout is demonstrated.
arXiv Detail & Related papers (2023-03-11T05:17:59Z) - Integrating Local Real Data with Global Gradient Prototypes for
Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively.
The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model.
In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z) - Auxo: Efficient Federated Learning via Scalable Client Clustering [22.323057948281644]
Federated learning (FL) enables edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server.
We propose Auxo to gradually identify clients with statistically similar data distributions (cohorts) in large-scale, low-availability, and resource-constrained FL populations.
We show Auxo boosts various existing FL solutions in terms of final accuracy (2.1% - 8.2%), convergence time (up to 2.2x), and model bias (4.8% - 53.8%)
arXiv Detail & Related papers (2022-10-29T17:36:51Z) - Does Decentralized Learning with Non-IID Unlabeled Data Benefit from
Self Supervision? [51.00034621304361]
We study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL)
We study the effectiveness of contrastive learning algorithms under decentralized learning settings.
arXiv Detail & Related papers (2022-10-20T01:32:41Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z) - Class Balancing GAN with a Classifier in the Loop [58.29090045399214]
We introduce a novel theoretically motivated Class Balancing regularizer for training GANs.
Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset.
We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets.
arXiv Detail & Related papers (2021-06-17T11:41:30Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.