Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm
- URL: http://arxiv.org/abs/2406.06655v1
- Date: Mon, 10 Jun 2024 09:57:30 GMT
- Title: Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm
- Authors: Ahmed Elbakary, Chaouki Ben Issaid, Mohammad Shehab, Karim Seddik, Tamer ElBatt, Mehdi Bennis,
- Abstract summary: Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates.
While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence.
This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models.
- Score: 28.505671833986067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.
Related papers
- A Historical Trajectory Assisted Optimization Method for Zeroth-Order Federated Learning [24.111048817721592]
Federated learning heavily relies on distributed gradient descent techniques.
In the situation where gradient information is not available, gradients need to be estimated from zeroth-order information.
We propose a non-isotropic sampling method to improve the gradient estimation procedure.
arXiv Detail & Related papers (2024-09-24T10:36:40Z) - Towards Hyper-parameter-free Federated Learning [1.3682156035049038]
We introduce algorithms for automated scaling of global model updates.
In first algorithm, we establish that a descent-ensuring step-size regime at the clients ensures descent for the server objective.
Second algorithm shows that the average of objective values of sampled clients is a practical and effective substitute for the value server required for computing the scaling factor.
arXiv Detail & Related papers (2024-08-30T09:35:36Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - On Principled Local Optimization Methods for Federated Learning [2.628859872479184]
dissertation aims to advance the theoretical foundation of local methods in the following three directions.
First, we establish sharp bounds for FedAvg, the most popular algorithm in Federated Learning.
Second, we propose Federated Accelerated Descent (FedAc), which provably improves the convergence rate and communication efficiency.
arXiv Detail & Related papers (2024-01-24T03:57:45Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Second-Order Guarantees in Federated Learning [49.17137296715029]
Federated learning is a useful centralized learning from distributed data.
This paper focuses on the second-order optimality algorithms in centralized and decentralized settings.
arXiv Detail & Related papers (2020-12-02T19:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.