Why Batch Normalization Damage Federated Learning on Non-IID Data?
- URL: http://arxiv.org/abs/2301.02982v3
- Date: Thu, 9 Nov 2023 11:56:46 GMT
- Title: Why Batch Normalization Damage Federated Learning on Non-IID Data?
- Authors: Yanmeng Wang, Qingjiang Shi, Tsung-Hui Chang
- Abstract summary: Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
- Score: 34.06900591666005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a promising distributed learning paradigm, federated learning (FL)
involves training deep neural network (DNN) models at the network edge while
protecting the privacy of the edge clients. To train a large-scale DNN model,
batch normalization (BN) has been regarded as a simple and effective means to
accelerate the training and improve the generalization capability. However,
recent findings indicate that BN can significantly impair the performance of FL
in the presence of non-i.i.d. data. While several FL algorithms have been
proposed to address this issue, their performance still falls significantly
when compared to the centralized scheme. Furthermore, none of them have
provided a theoretical explanation of how the BN damages the FL convergence. In
this paper, we present the first convergence analysis to show that under the
non-i.i.d. data, the mismatch between the local and global statistical
parameters in BN causes the gradient deviation between the local and global
models, which, as a result, slows down and biases the FL convergence. In view
of this, we develop a new FL algorithm that is tailored to BN, called FedTAN,
which is capable of achieving robust FL performance under a variety of data
distributions via iterative layer-wise parameter aggregation. Comprehensive
experimental results demonstrate the superiority of the proposed FedTAN over
existing baselines for training BN-based DNN models.
Related papers
- DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment [57.62885438406724]
Graph neural networks are recognized for their strong performance across various applications.
BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks.
We propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning.
arXiv Detail & Related papers (2024-06-04T07:24:51Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental
Regularization [5.182014186927254]
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs)
We contribute with a novel FL framework (coined FedDIP) which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange.
We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods.
arXiv Detail & Related papers (2023-09-13T08:51:19Z) - Making Batch Normalization Great in Federated Deep Learning [32.81480654534734]
Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization.
Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
arXiv Detail & Related papers (2023-03-12T01:12:43Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - Batch Normalization Explained [31.66311831317311]
We show that batch normalization (BN) boosts DN learning and inference performance.
BN is an unsupervised learning technique that adapts the geometry of a DN's spline partition to match the data.
We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries.
arXiv Detail & Related papers (2022-09-29T13:41:27Z) - NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized
Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing.
Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers.
We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL.
Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Delay Minimization for Federated Learning Over Wireless Communication
Networks [172.42768672943365]
The problem of delay computation for federated learning (FL) over wireless communication networks is investigated.
A bisection search algorithm is proposed to obtain the optimal solution.
Simulation results show that the proposed algorithm can reduce delay by up to 27.3% compared to conventional FL methods.
arXiv Detail & Related papers (2020-07-05T19:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.