FedNAR: Federated Optimization with Normalized Annealing Regularization
- URL: http://arxiv.org/abs/2310.03163v1
- Date: Wed, 4 Oct 2023 21:11:40 GMT
- Title: FedNAR: Federated Optimization with Normalized Annealing Regularization
- Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang
- Abstract summary: We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
- Score: 54.42032094044368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weight decay is a standard technique to improve generalization performance in
modern deep neural network optimization, and is also widely adopted in
federated learning (FL) to prevent overfitting in local clients. In this paper,
we first explore the choices of weight decay and identify that weight decay
value appreciably influences the convergence of existing FL algorithms. While
preventing overfitting is crucial, weight decay can introduce a different
optimization goal towards the global objective, which is further amplified in
FL due to multiple local updates and heterogeneous data distribution. To
address this challenge, we develop {\it Federated optimization with Normalized
Annealing Regularization} (FedNAR), a simple yet effective and versatile
algorithmic plug-in that can be seamlessly integrated into any existing FL
algorithms. Essentially, we regulate the magnitude of each update by performing
co-clipping of the gradient and weight decay. We provide a comprehensive
theoretical analysis of FedNAR's convergence rate and conduct extensive
experiments on both vision and language datasets with different backbone
federated optimization algorithms. Our experimental results consistently
demonstrate that incorporating FedNAR into existing FL algorithms leads to
accelerated convergence and heightened model accuracy. Moreover, FedNAR
exhibits resilience in the face of various hyperparameter configurations.
Specifically, FedNAR has the ability to self-adjust the weight decay when the
initial specification is not optimal, while the accuracy of traditional FL
algorithms would markedly decline. Our codes are released at
\href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
Related papers
- Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the
JKO Scheme [60.79981399724534]
A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution.
We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $alpha$.
arXiv Detail & Related papers (2022-11-30T05:53:21Z) - NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized
Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing.
Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers.
We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z) - Communication-Efficient Stochastic Zeroth-Order Optimization for
Federated Learning [28.65635956111857]
Federated learning (FL) enables edge devices to collaboratively train a global model without sharing their private data.
To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order computation to first-order methods.
arXiv Detail & Related papers (2022-01-24T08:56:06Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - FedFog: Network-Aware Optimization of Federated Learning over Wireless
Fog-Cloud Systems [40.421253127588244]
Federated learning (FL) is capable of performing large distributed machine learning tasks across multiple edge users by periodically aggregating trained local parameters.
We first propose an efficient FL algorithm (called FedFog) to perform the local aggregation of gradient parameters at fog servers and global training update at the cloud.
arXiv Detail & Related papers (2021-07-04T08:03:15Z) - FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL.
Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z) - Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with
Delayed Gradients [21.63719641718363]
Federated learning (FL) is a new machine learning framework which trains a joint model across a large amount of decentralized computing devices.
This paper presents a novel FL algorithm, namely Hybrid Federated Learning (HFL), to achieve a learning balance in efficiency and effectiveness.
arXiv Detail & Related papers (2021-02-12T02:27:44Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.