The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective
- URL: http://arxiv.org/abs/2502.03231v1
- Date: Wed, 05 Feb 2025 14:45:56 GMT
- Title: The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective
- Authors: Guogang Zhu, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Xinghao Wu,
- Abstract summary: In federated learning (FL), model aggregation is a critical step by which multiple clients share their knowledge with one another.<n>This temporary performance drop can potentially slow down the convergence of the FL model.<n>We propose several simple yet effective strategies to mitigate the negative impacts of model aggregation.
- Score: 12.916988821333124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In federated learning (FL), model aggregation is a critical step by which multiple clients share their knowledge with one another. However, it is also widely recognized that the aggregated model, when sent back to each client, performs poorly on local data until after several rounds of local training. This temporary performance drop can potentially slow down the convergence of the FL model. Most research in FL regards this performance drop as an inherent cost of knowledge sharing among clients and does not give it special attention. While some studies directly focus on designing techniques to alleviate the issue, an in-depth investigation of the reasons behind this performance drop has yet to be conducted.To address this gap, we conduct a layer-peeled analysis of model aggregation across various datasets and model architectures. Our findings reveal that the performance drop can be attributed to two major consequences of the aggregation process: (1) it disrupts feature variability suppression in deep neural networks (DNNs), and (2) it weakens the coupling between features and subsequent parameters.Based on these findings, we propose several simple yet effective strategies to mitigate the negative impacts of model aggregation while still enjoying the benefit it brings. To the best of our knowledge, our work is the first to conduct a layer-peeled analysis of model aggregation, potentially paving the way for the development of more effective FL algorithms.
Related papers
- Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling.
We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z) - Federated Testing (FedTest): A New Scheme to Enhance Convergence and Mitigate Adversarial Attacks in Federating Learning [35.14491996649841]
We introduce a novel federated learning framework, which we call federated testing for federated learning (FedTest)<n>In FedTest, the local data of a specific user is used to train the model of that user and test the models of the other users.<n>Our numerical results reveal that the proposed method not only accelerates convergence rates but also diminishes the potential influence of malicious users.
arXiv Detail & Related papers (2025-01-19T21:01:13Z) - ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets.<n>We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets.<n>We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z) - Unraveling the Temporal Dynamics of the Unet in Diffusion Models [33.326244121918634]
Diffusion models introduce Gaussian noise into training data and reconstruct the original data iteratively.
Central to this iterative process is a single Unet, adapting across time steps to facilitate generation.
Recent work revealed the presence of composition and denoising phases in this generation process.
arXiv Detail & Related papers (2023-12-17T04:40:33Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - Jointly Exploring Client Drift and Catastrophic Forgetting in Dynamic
Learning [1.0808810256442274]
Client Drift and Catastrophic Forgetting are fundamental obstacles to guaranteeing consistent performance.
We propose a unified framework for building a controlled test environment for Client Drift and Catastrophic Forgetting.
We show that a combination of moderate Client Drift and Catastrophic Forgetting can even improve the performance of the resulting model.
arXiv Detail & Related papers (2023-09-01T18:14:30Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - Toward Understanding the Influence of Individual Clients in Federated
Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server.
We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.