Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
- URL: http://arxiv.org/abs/2507.10348v1
- Date: Mon, 14 Jul 2025 14:51:18 GMT
- Title: Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
- Authors: Yichen Li,
- Abstract summary: Model-heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally.<n>To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model.<n>We propose a new feature-based ensemble federated knowledge distillation paradigm is proposed.
- Score: 8.04716022048554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.
Related papers
- CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - FedSKD: Aggregation-free Model-heterogeneous Federated Learning using Multi-dimensional Similarity Knowledge Distillation [7.944298319589845]
Federated learning (FL) enables privacy-preserving collaborative model training without direct data sharing.<n>Model-heterogeneous FL (MHFL) allows clients to train personalized models with heterogeneous architectures tailored to their computational resources and application-specific needs.<n>While peer-to-peer (P2P) FL removes server dependence, it suffers from model drift and knowledge dilution, limiting its effectiveness in heterogeneous settings.<n>We propose FedSKD, a novel MHFL framework that facilitates direct knowledge exchange through round-robin model circulation.
arXiv Detail & Related papers (2025-03-23T05:33:10Z) - pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning [34.01721941230425]
We propose a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) for supervised learning tasks.
It significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement.
arXiv Detail & Related papers (2024-04-27T09:52:59Z) - Spectral Co-Distillation for Personalized Federated Learning [69.97016362754319]
We propose a novel distillation method based on model spectrum information to better capture generic versus personalized representations.
We also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training.
We demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.
arXiv Detail & Related papers (2024-01-29T16:01:38Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Selective Knowledge Sharing for Privacy-Preserving Federated
Distillation without A Good Teacher [52.2926020848095]
Federated learning is vulnerable to white-box attacks and struggles to adapt to heterogeneous clients.
This paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD.
arXiv Detail & Related papers (2023-04-04T12:04:19Z) - HierarchyFL: Heterogeneous Federated Learning via Hierarchical
Self-Distillation [12.409497615805797]
Federated learning (FL) has been recognized as a privacy-preserving distributed machine learning paradigm.
FL suffers from model inaccuracy and slow convergence due to the model heterogeneity of the AIoT devices involved.
We propose an efficient framework named HierarchyFL, which uses a small amount of public data for efficient and scalable knowledge.
arXiv Detail & Related papers (2022-12-05T03:32:10Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.