A Newton-type algorithm for federated learning based on incremental
Hessian eigenvector sharing
- URL: http://arxiv.org/abs/2202.05800v1
- Date: Fri, 11 Feb 2022 17:52:56 GMT
- Title: A Newton-type algorithm for federated learning based on incremental
Hessian eigenvector sharing
- Authors: Nicol\`o Dal Fabbro, Subhrakanti Dey, Michele Rossi, Luca Schenato
- Abstract summary: We present an original communication-constrained Newton-type (NT) algorithm designed to accelerate Federated Learning (FL)
The proposed solution is thoroughly validated on real datasets.
- Score: 5.404315085380945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing interest in the decentralized optimization framework that
goes under the name of Federated Learning (FL). In particular, much attention
is being turned to FL scenarios where the network is strongly heterogeneous in
terms of communication resources (e.g., bandwidth) and data distribution. In
these cases, communication between local machines (agents) and the central
server (Master) is a main consideration. In this work, we present an original
communication-constrained Newton-type (NT) algorithm designed to accelerate FL
in such heterogeneous scenarios. The algorithm is by design robust to non
i.i.d. data distributions, handles heterogeneity of agents' communication
resources (CRs), only requires sporadic Hessian computations, and achieves
super-linear convergence. This is possible thanks to an incremental strategy,
based on a singular value decomposition (SVD) of the local Hessian matrices,
which exploits (possibly) outdated second-order information. The proposed
solution is thoroughly validated on real datasets by assessing (i) the number
of communication rounds required for convergence, (ii) the overall amount of
data transmitted and (iii) the number of local Hessian computations required.
For all these metrics, the proposed approach shows superior performance against
state-of-the art techniques like GIANT and FedNL.
Related papers
- Asynchronous Local Computations in Distributed Bayesian Learning [8.516532665507835]
We propose gossip-based communication to leverage fast computations and reduce communication overhead simultaneously.
We observe faster initial convergence and improved performance accuracy, especially in the low data range.
We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.
arXiv Detail & Related papers (2023-11-06T20:11:41Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors
Quantization [5.404315085380945]
Newton-type (NT) methods have been advocated as enablers of robust convergence rates in DO problems.
We propose Q-SHED, an original NT algorithm for DO featuring a novel bit-allocation scheme based on incremental Hessian eigenvectors quantization.
We show that Q-SHED can reduce by up to 60% the number of communication rounds required for convergence.
arXiv Detail & Related papers (2023-05-18T10:15:03Z) - Resource-constrained Federated Edge Learning with Heterogeneous Data:
Formulation and Analysis [8.863089484787835]
We propose a distributed approximate Newton-type Newton-type training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data.
FedOVA decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning.
arXiv Detail & Related papers (2021-10-14T17:35:24Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.