Related papers: Federated Optimization with Doubly Regularized Drift Correction

Federated Optimization with Doubly Regularized Drift Correction

URL: http://arxiv.org/abs/2404.08447v1
Date: Fri, 12 Apr 2024 12:57:43 GMT
Title: Federated Optimization with Doubly Regularized Drift Correction
Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich,
Abstract summary: Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints, and (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/D
Score: 20.30761752651984
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by using doubly regularized drift correction.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Communication Efficient Federated Learning with Linear Convergence on Heterogeneous Data [4.8305656901807055]
We propose a federated learning algorithm called FedCET to ensure accurate convergence under heterogeneous data distributions. We prove that under appropriate learning rates, FedCET can ensure linear convergence to the exact solution.
arXiv Detail & Related papers (2025-03-20T02:43:02Z)
FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration [3.096113258362507]
Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model. First-order methods, particularly those incorporating variance reduction techniques, are the most widely used FL algorithms due to their simple implementation and stable performance. We propose FedOSAA, a novel approach that preserves the simplicity of first-order methods while achieving the rapid convergence typically associated with second-order methods.
arXiv Detail & Related papers (2025-03-14T00:10:02Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. We propose textbfFedDA, a novel framework for local adaptive gradient methods. We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z)
AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation [12.62716075696359]
In Federated Learning (FL), a number of clients or devices collaborate to train a model without sharing their data. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into FL optimization recently. We propose an adaptive algorithm that accurately estimates drift across clients.
arXiv Detail & Related papers (2022-04-27T20:04:24Z)
FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement [11.687451505965655]
We propose FedCos, which reduces the directional inconsistency of local models by introducing a cosine-similarity penalty. We show that FedCos outperforms the well-known baselines and can enhance them under a variety of FL scenes. With the help of FedCos, multiple FL methods require significantly fewer communication rounds than before to obtain a model with comparable performance.
arXiv Detail & Related papers (2022-04-07T02:59:54Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers. Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z)
Local Adaptivity in Federated Learning: Convergence and Consistency [25.293584783673413]
Federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL.
arXiv Detail & Related papers (2021-06-04T07:36:59Z)
Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization [10.592277756185046]
Large scale distributed optimization has become the default tool for the training of supervised machine learning models. We propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.
arXiv Detail & Related papers (2021-02-14T20:55:02Z)
Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm. Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.