Related papers: Why Batch Normalization Damage Federated Learning on Non-IID Data?

Why Batch Normalization Damage Federated Learning on Non-IID Data?

URL: http://arxiv.org/abs/2301.02982v3
Date: Thu, 9 Nov 2023 11:56:46 GMT
Title: Why Batch Normalization Damage Federated Learning on Non-IID Data?
Authors: Yanmeng Wang, Qingjiang Shi, Tsung-Hui Chang
Abstract summary: Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization. Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
Score: 34.06900591666005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability. However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme. Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence. In this paper, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence. In view of this, we develop a new FL algorithm that is tailored to BN, called FedTAN, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation. Comprehensive experimental results demonstrate the superiority of the proposed FedTAN over existing baselines for training BN-based DNN models.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
FedEP: Tailoring Attention to Heterogeneous Data Distribution with Entropy Pooling for Decentralized Federated Learning [8.576433180938004]
This paper proposes a novel DFL aggregation algorithm, Federated Entropy Pooling (FedEP) FedEP mitigates the client drift problem by incorporating the statistical characteristics of local distributions instead of any actual data. Experiments have demonstrated that FedEP can achieve faster convergence and show higher test performance than state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-10T07:39:15Z)
BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning [2.563180814294141]
Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way. Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN) BN has been reported to hinder performance of DNNs in heterogeneous FL. We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
arXiv Detail & Related papers (2024-10-04T09:53:20Z)
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing. While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z)
DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment [57.62885438406724]
Graph neural networks are recognized for their strong performance across various applications. BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. We propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning.
arXiv Detail & Related papers (2024-06-04T07:24:51Z)
FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z)
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization [5.182014186927254]
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs) We contribute with a novel FL framework (coined FedDIP) which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods.
arXiv Detail & Related papers (2023-09-13T08:51:19Z)
Making Batch Normalization Great in Federated Deep Learning [32.81480654534734]
Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization. Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
arXiv Detail & Related papers (2023-03-12T01:12:43Z)
Batch Normalization Explained [31.66311831317311]
We show that batch normalization (BN) boosts DN learning and inference performance. BN is an unsupervised learning technique that adapts the geometry of a DN's spline partition to match the data. We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries.
arXiv Detail & Related papers (2022-09-29T13:41:27Z)
NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers. We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs) We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
Delay Minimization for Federated Learning Over Wireless Communication Networks [172.42768672943365]
The problem of delay computation for federated learning (FL) over wireless communication networks is investigated. A bisection search algorithm is proposed to obtain the optimal solution. Simulation results show that the proposed algorithm can reduce delay by up to 27.3% compared to conventional FL methods.
arXiv Detail & Related papers (2020-07-05T19:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.