Escaping Saddle Points in Heterogeneous Federated Learning via
Distributed SGD with Communication Compression
- URL: http://arxiv.org/abs/2310.19059v1
- Date: Sun, 29 Oct 2023 16:24:53 GMT
- Title: Escaping Saddle Points in Heterogeneous Federated Learning via
Distributed SGD with Communication Compression
- Authors: Sijin Chen, Zhize Li, Yuejie Chi
- Abstract summary: We propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme.
Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions.
Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data.
- Score: 42.85797250438604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of finding second-order stationary points of
heterogeneous federated learning (FL). Previous works in FL mostly focus on
first-order convergence guarantees, which do not rule out the scenario of
unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve
communication efficiency without compensating the learning accuracy, especially
when local data are highly heterogeneous across different clients. Given this,
we propose a novel algorithm Power-EF that only communicates compressed
information via a novel error-feedback scheme. To our knowledge, Power-EF is
the first distributed and compressed SGD algorithm that provably escapes saddle
points in heterogeneous FL without any data homogeneity assumptions. In
particular, Power-EF improves to second-order stationary points after visiting
first-order (possibly saddle) points, using additional gradient queries and
communication rounds only of almost the same order required by first-order
convergence, and the convergence rate exhibits a linear speedup in terms of the
number of workers. Our theory improves/recovers previous results, while
extending to much more tolerant settings on the local data. Numerical
experiments are provided to complement the theory.
Related papers
- Switch and Conquer: Efficient Algorithms By Switching Stochastic
Gradient Oracles For Decentralized Saddle Point Problems [1.2277343096128712]
We develop an inexact primal hybrid gradient (inexact PDHG) procedure that allows generic gradient oracles to update the primal and dual variables.
We show that utilizing the best convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for obtaining solutions of low/medium accuracy faster.
arXiv Detail & Related papers (2023-09-02T17:48:42Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized
Federated Learning with Heterogeneous Data [12.701031075169887]
Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing.
Most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers.
We propose a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity.
arXiv Detail & Related papers (2022-08-17T19:17:23Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Achieving Statistical Optimality of Federated Learning: Beyond
Stationary Points [19.891597817559038]
Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud.
Recent work raised concerns on two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally.
We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates.
arXiv Detail & Related papers (2021-06-29T09:59:43Z) - Second-Order Guarantees in Federated Learning [49.17137296715029]
Federated learning is a useful centralized learning from distributed data.
This paper focuses on the second-order optimality algorithms in centralized and decentralized settings.
arXiv Detail & Related papers (2020-12-02T19:30:08Z) - Federated Composite Optimization [28.11253930828807]
Federated Learning (FL) is a distributed learning paradigm that scales on-device learning collaboratively and privately.
Standard FL algorithms such as FedAvg are primarily geared towards smooth unconstrained settings.
We propose a new primal-dual algorithm, Federated Dual Averaging (FedDualAvg), which by employing a novel server dual averaging procedure circumvents the curse of primal averaging.
arXiv Detail & Related papers (2020-11-17T06:54:06Z) - A Unified Linear Speedup Analysis of Federated Averaging and Nesterov
FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data.
In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today.
We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.