DONE: Distributed Approximate Newton-type Method for Federated Edge
Learning
- URL: http://arxiv.org/abs/2012.05625v3
- Date: Fri, 30 Apr 2021 02:12:57 GMT
- Title: DONE: Distributed Approximate Newton-type Method for Federated Edge
Learning
- Authors: Canh T. Dinh, Nguyen H. Tran, Tuan Dung Nguyen, Wei Bao, Amir Rezaei
Balef, Albert Y. Zomaya
- Abstract summary: DONE is a distributed approximate Newton-type algorithm with fast convergence rate.
We show DONE attains a comparable performance to the Newton's method.
- Score: 41.20946186966816
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: There is growing interest in applying distributed machine learning to edge
computing, forming federated edge learning. Federated edge learning faces
non-i.i.d. and heterogeneous data, and the communication between edge workers,
possibly through distant locations and with unstable wireless networks, is more
costly than their local computational overhead. In this work, we propose DONE,
a distributed approximate Newton-type algorithm with fast convergence rate for
communication-efficient federated edge learning. First, with strongly convex
and smooth loss functions, DONE approximates the Newton direction in a
distributed manner using the classical Richardson iteration on each edge
worker. Second, we prove that DONE has linear-quadratic convergence and analyze
its communication complexities. Finally, the experimental results with
non-i.i.d. and heterogeneous data show that DONE attains a comparable
performance to the Newton's method. Notably, DONE requires fewer communication
iterations compared to distributed gradient descent and outperforms DANE and
FEDL, state-of-the-art approaches, in the case of non-quadratic loss functions.
Related papers
- Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.
However, these methods do not exploit curvature information.
Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z) - Non-Convex Optimization in Federated Learning via Variance Reduction and Adaptive Learning [13.83895180419626]
This paper proposes a novel algorithm that leverages momentum-based variance reduction with adaptive learning to address non-epsilon settings across heterogeneous data.
We aim to overcome challenges related to variance, hinders efficiency, and the slow convergence from learning rate adjustments with heterogeneous data.
arXiv Detail & Related papers (2024-12-16T11:02:38Z) - FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning [27.957498393822338]
We introduce a novel approach to tackle this issue while still achieving fast convergence rates.
Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian.
We provide convergence analysis based on statistical learning for the federated Newton sketch approaches.
arXiv Detail & Related papers (2024-01-05T10:06:41Z) - Composite federated learning with heterogeneous data [11.40641907024708]
We propose a novel algorithm for solving the composite Federated Learning (FL) problem.
This algorithm manages non-smooth regularization by strategically decoupling the proximal operator and communication, and addresses client drift without any assumptions about data similarity.
We prove that our algorithm converges linearly to a neighborhood of the optimal solution and demonstrate the superiority of our algorithm over state-of-the-art methods in numerical experiments.
arXiv Detail & Related papers (2023-09-04T20:22:57Z) - Adaptive pruning-based Newton's method for distributed learning [14.885388389215587]
This paper presents a novel and efficient algorithm named Distributed Adaptive Newton Learning (textttDANL)
textttDANL attains a linear convergence rate while efficiently adapting to available resources and keeping high efficiency.
Experiments demonstrate that textttDANL achieves linear convergence with efficient communication and strong performance across different datasets.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type
Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS.
FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.