BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning
- URL: http://arxiv.org/abs/2410.03281v1
- Date: Fri, 4 Oct 2024 09:53:20 GMT
- Title: BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning
- Authors: Gonzalo Iñaki Quintana, Laurence Vancamberg, Vincent Jugnon, Mathilde Mougeot, Agnès Desolneux,
- Abstract summary: Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way.
Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN)
BN has been reported to hinder performance of DNNs in heterogeneous FL.
We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
- Score: 2.563180814294141
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way. Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN), as it improves convergence and generalization. However, BN has been reported to hinder performance of DNNs in heterogeneous FL. Recently, the FedTAN algorithm has been proposed to mitigate the effect of heterogeneity on BN, by aggregating BN statistics and gradients from all the clients. However, it has a high communication cost, that increases linearly with the depth of the DNN. SCAFFOLD is a variance reduction algorithm, that estimates and corrects the client drift in a communication-efficient manner. Despite its promising results in heterogeneous FL settings, it has been reported to underperform for models with BN. In this work, we seek to revive SCAFFOLD, and more generally variance reduction, as an efficient way of training DNN with BN in heterogeneous FL. We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting, inspired of by the work of Wang et al. 2023, and show that SCAFFOLD is unable to remove the bias introduced by BN. We thus propose the BN-SCAFFOLD algorithm, which extends the client drift correction of SCAFFOLD to BN statistics. We prove convergence using the aforementioned framework and validate the theoretical results with experiments on MNIST and CIFAR-10. BN-SCAFFOLD equals the performance of FedTAN, without its high communication cost, outperforming Federated Averaging (FedAvg), SCAFFOLD, and other FL algorithms designed to mitigate BN heterogeneity.
Related papers
- FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Making Batch Normalization Great in Federated Deep Learning [32.81480654534734]
Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization.
Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
arXiv Detail & Related papers (2023-03-12T01:12:43Z) - Why Batch Normalization Damage Federated Learning on Non-IID Data? [34.06900591666005]
Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
arXiv Detail & Related papers (2023-01-08T05:24:12Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Batch Normalization Explained [31.66311831317311]
We show that batch normalization (BN) boosts DN learning and inference performance.
BN is an unsupervised learning technique that adapts the geometry of a DN's spline partition to match the data.
We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries.
arXiv Detail & Related papers (2022-09-29T13:41:27Z) - "BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN)
We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Batch Normalization Increases Adversarial Vulnerability and Decreases
Adversarial Transferability: A Non-Robust Feature Perspective [91.5105021619887]
Batch normalization (BN) has been widely used in modern deep neural networks (DNNs)
BN is observed to increase the model accuracy while at the cost of adversarial robustness.
It remains unclear whether BN mainly favors learning robust features (RFs) or non-robust features (NRFs)
arXiv Detail & Related papers (2020-10-07T10:24:33Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.