Understanding the Role of Layer Normalization in Label-Skewed Federated
Learning
- URL: http://arxiv.org/abs/2308.09565v2
- Date: Thu, 15 Feb 2024 02:43:25 GMT
- Title: Understanding the Role of Layer Normalization in Label-Skewed Federated
Learning
- Authors: Guojun Zhang, Mahdi Beitollahi, Alex Bie, Xi Chen
- Abstract summary: Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models.
In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning.
Our results verify that FN is an essential ingredient inside LN to significantly improve the convergence of FL while remaining robust to learning rate choices.
- Score: 15.19762600396105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Layer normalization (LN) is a widely adopted deep learning technique
especially in the era of foundation models. Recently, LN has been shown to be
surprisingly effective in federated learning (FL) with non-i.i.d. data.
However, exactly why and how it works remains mysterious. In this work, we
reveal the profound connection between layer normalization and the label shift
problem in federated learning. To understand layer normalization better in FL,
we identify the key contributing mechanism of normalization methods in FL,
called feature normalization (FN), which applies normalization to the latent
feature representation before the classifier head. Although LN and FN do not
improve expressive power, they control feature collapse and local overfitting
to heavily skewed datasets, and thus accelerates global training. Empirically,
we show that normalization leads to drastic improvements on standard benchmarks
under extreme label shift. Moreover, we conduct extensive ablation studies to
understand the critical factors of layer normalization in FL. Our results
verify that FN is an essential ingredient inside LN to significantly improve
the convergence of FL while remaining robust to learning rate choices,
especially under extreme label shift where each client has access to few
classes. Our code is available at
\url{https://github.com/huawei-noah/Federated-Learning/tree/main/Layer_Normalization}.
Related papers
- Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - Revisiting Early-Learning Regularization When Federated Learning Meets
Noisy Labels [27.777781072683986]
This paper revisits early-learning regularization, introducing an innovative strategy, Federated Label-mixture Regularization (FLR)
FLR adeptly adapts to FL's complexities by generating new pseudo labels, blending local and global model predictions.
arXiv Detail & Related papers (2024-02-08T02:21:33Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Federated Neuro-Symbolic Learning [39.04545654772026]
We present Federated Neuro-Symbolic Learning framework (FedNSL) with latent variables as the FL communication medium.
FedNSL is capable of identifying and addressing rule distribution through a simple and effective Kullback-Leibler (KL) divergence constraint.
Extensive experiments based on both synthetic and real-world data demonstrate significant advantages of FedNSL compared to five state-of-the-art methods.
arXiv Detail & Related papers (2023-08-29T14:20:17Z) - Making Batch Normalization Great in Federated Deep Learning [32.81480654534734]
Batch Normalization (BN) is widely used in centralized deep learning to improve convergence and generalization.
Prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN)
arXiv Detail & Related papers (2023-03-12T01:12:43Z) - Rethinking Normalization Methods in Federated Learning [92.25845185724424]
Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks by not explicitly sharing private data.
We show that external covariate shifts will lead to the obliteration of some devices' contributions to the global model.
arXiv Detail & Related papers (2022-10-07T01:32:24Z) - Federated Learning with Label Distribution Skew via Logits Calibration [26.98248192651355]
In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients.
We propose FedLC, which calibrates the logits before softmax cross-entropy according to the probability of occurrence of each class.
Experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model.
arXiv Detail & Related papers (2022-09-01T02:56:39Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - The Role of Global Labels in Few-Shot Classification and How to Infer
Them [55.64429518100676]
Few-shot learning is a central problem in meta-learning, where learners must quickly adapt to new tasks.
We propose Meta Label Learning (MeLa), a novel algorithm that infers global labels and obtains robust few-shot models via standard classification.
arXiv Detail & Related papers (2021-08-09T14:07:46Z) - Improving Semi-supervised Federated Learning by Reducing the Gradient
Diversity of Models [67.66144604972052]
Federated learning (FL) is a promising way to use the computing power of mobile devices while maintaining privacy of users.
We show that a critical issue that affects the test accuracy is the large gradient diversity of the models from different users.
We propose a novel grouping-based model averaging method to replace the FedAvg averaging method.
arXiv Detail & Related papers (2020-08-26T03:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.