Making Batch Normalization Great in Federated Deep Learning
- URL: http://arxiv.org/abs/2303.06530v4
- Date: Fri, 29 Mar 2024 03:37:04 GMT
- Title: Making Batch Normalization Great in Federated Deep Learning
- Authors: Jike Zhong, Hong-You Chen, Wei-Lun Chao,
- Abstract summary: Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization.
Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
- Score: 32.81480654534734
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Batch Normalization (BN) is widely used in {centralized} deep learning to improve convergence and generalization. However, in {federated} learning (FL) with decentralized data, prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN). In this paper, we revisit this substitution by expanding the empirical study conducted in prior work. Surprisingly, we find that BN outperforms GN in many FL settings. The exceptions are high-frequency communication and extreme non-IID regimes. We reinvestigate factors that are believed to cause this problem, including the mismatch of BN statistics across clients and the deviation of gradients during local training. We empirically identify a simple practice that could reduce the impacts of these factors while maintaining the strength of BN. Our approach, which we named FIXBN, is fairly easy to implement, without any additional training or communication costs, and performs favorably across a wide range of FL settings. We hope that our study could serve as a valuable reference for future practical usage and theoretical analysis in FL.
Related papers
- BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning [2.563180814294141]
Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way.
Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN)
BN has been reported to hinder performance of DNNs in heterogeneous FL.
We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
arXiv Detail & Related papers (2024-10-04T09:53:20Z) - Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - Harnessing the Power of Federated Learning in Federated Contextual Bandits [20.835106310302876]
Federated contextual bandits (FCB) are a pivotal integration of FL and sequential decision-making.
FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework.
In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm.
arXiv Detail & Related papers (2023-12-26T21:44:09Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - Understanding How Consistency Works in Federated Learning via Stage-wise
Relaxed Initialization [84.42306265220274]
Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model.
Previous works have implicitly studied that FL suffers from the client-drift'' problem, which is caused by the inconsistent optimum across local clients.
To alleviate the negative impact of the client drift'' and explore its substance in FL, we first design an efficient FL algorithm textitFedInit.
arXiv Detail & Related papers (2023-06-09T06:55:15Z) - Why Batch Normalization Damage Federated Learning on Non-IID Data? [34.06900591666005]
Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
arXiv Detail & Related papers (2023-01-08T05:24:12Z) - Rethinking Normalization Methods in Federated Learning [92.25845185724424]
Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks by not explicitly sharing private data.
We show that external covariate shifts will lead to the obliteration of some devices' contributions to the global model.
arXiv Detail & Related papers (2022-10-07T01:32:24Z) - Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning.
BN is not suitable for use with very small mini-batch sizes or online learning.
We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.