Related papers: FedNAR: Federated Optimization with Normalized Annealing Regularization

FedNAR: Federated Optimization with Normalized Annealing Regularization

URL: http://arxiv.org/abs/2310.03163v1
Date: Wed, 4 Oct 2023 21:11:40 GMT
Title: FedNAR: Federated Optimization with Normalized Annealing Regularization
Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang
Abstract summary: We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
Score: 54.42032094044368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.

Related papers

FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging [23.786128968778396]
We find that FedSAM usually performs worse than FedAvg in the case of highly heterogeneous data.<n>We propose a novel and effective federated learning algorithm with Weight Averaging (called texttFedSWA), which aims to find flatter minima.<n>We also introduce a new momentum-based controlled weight averaging FL algorithm (texttFedMoSWA), which is designed to better align local and global models.
arXiv Detail & Related papers (2025-07-26T17:12:40Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the JKO Scheme [60.79981399724534]
A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $alpha$.
arXiv Detail & Related papers (2022-11-30T05:53:21Z)
NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers. We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z)
Communication-Efficient Stochastic Zeroth-Order Optimization for Federated Learning [28.65635956111857]
Federated learning (FL) enables edge devices to collaboratively train a global model without sharing their private data. To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order computation to first-order methods.
arXiv Detail & Related papers (2022-01-24T08:56:06Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
FedFog: Network-Aware Optimization of Federated Learning over Wireless Fog-Cloud Systems [40.421253127588244]
Federated learning (FL) is capable of performing large distributed machine learning tasks across multiple edge users by periodically aggregating trained local parameters. We first propose an efficient FL algorithm (called FedFog) to perform the local aggregation of gradient parameters at fog servers and global training update at the cloud.
arXiv Detail & Related papers (2021-07-04T08:03:15Z)
FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL. Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z)
Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients [21.63719641718363]
Federated learning (FL) is a new machine learning framework which trains a joint model across a large amount of decentralized computing devices. This paper presents a novel FL algorithm, namely Hybrid Federated Learning (HFL), to achieve a learning balance in efficiency and effectiveness.
arXiv Detail & Related papers (2021-02-12T02:27:44Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.