Why Batch Normalization Damage Federated Learning on Non-IID Data?
- URL: http://arxiv.org/abs/2301.02982v3
- Date: Thu, 9 Nov 2023 11:56:46 GMT
- Title: Why Batch Normalization Damage Federated Learning on Non-IID Data?
- Authors: Yanmeng Wang, Qingjiang Shi, Tsung-Hui Chang
- Abstract summary: Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
- Score: 34.06900591666005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a promising distributed learning paradigm, federated learning (FL)
involves training deep neural network (DNN) models at the network edge while
protecting the privacy of the edge clients. To train a large-scale DNN model,
batch normalization (BN) has been regarded as a simple and effective means to
accelerate the training and improve the generalization capability. However,
recent findings indicate that BN can significantly impair the performance of FL
in the presence of non-i.i.d. data. While several FL algorithms have been
proposed to address this issue, their performance still falls significantly
when compared to the centralized scheme. Furthermore, none of them have
provided a theoretical explanation of how the BN damages the FL convergence. In
this paper, we present the first convergence analysis to show that under the
non-i.i.d. data, the mismatch between the local and global statistical
parameters in BN causes the gradient deviation between the local and global
models, which, as a result, slows down and biases the FL convergence. In view
of this, we develop a new FL algorithm that is tailored to BN, called FedTAN,
which is capable of achieving robust FL performance under a variety of data
distributions via iterative layer-wise parameter aggregation. Comprehensive
experimental results demonstrate the superiority of the proposed FedTAN over
existing baselines for training BN-based DNN models.
Related papers
- FedEP: Tailoring Attention to Heterogeneous Data Distribution with Entropy Pooling for Decentralized Federated Learning [8.576433180938004]
This paper proposes a novel DFL aggregation algorithm, Federated Entropy Pooling (FedEP)
FedEP mitigates the client drift problem by incorporating the statistical characteristics of local distributions instead of any actual data.
Experiments have demonstrated that FedEP can achieve faster convergence and show higher test performance than state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-10T07:39:15Z) - BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning [2.563180814294141]
Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way.
Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN)
BN has been reported to hinder performance of DNNs in heterogeneous FL.
We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
arXiv Detail & Related papers (2024-10-04T09:53:20Z) - Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment [57.62885438406724]
Graph neural networks are recognized for their strong performance across various applications.
BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks.
We propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning.
arXiv Detail & Related papers (2024-06-04T07:24:51Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental
Regularization [5.182014186927254]
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs)
We contribute with a novel FL framework (coined FedDIP) which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange.
We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods.
arXiv Detail & Related papers (2023-09-13T08:51:19Z) - Making Batch Normalization Great in Federated Deep Learning [32.81480654534734]
Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization.
Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
arXiv Detail & Related papers (2023-03-12T01:12:43Z) - NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized
Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing.
Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers.
We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Delay Minimization for Federated Learning Over Wireless Communication
Networks [172.42768672943365]
The problem of delay computation for federated learning (FL) over wireless communication networks is investigated.
A bisection search algorithm is proposed to obtain the optimal solution.
Simulation results show that the proposed algorithm can reduce delay by up to 27.3% compared to conventional FL methods.
arXiv Detail & Related papers (2020-07-05T19:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.