Faster Rates for Compressed Federated Learning with Client-Variance
Reduction
- URL: http://arxiv.org/abs/2112.13097v3
- Date: Sun, 24 Sep 2023 12:40:18 GMT
- Title: Faster Rates for Compressed Federated Learning with Client-Variance
Reduction
- Authors: Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richt\'arik
- Abstract summary: We show that COFIG and FRECON converge within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds.
In the convex setting COFIG converges within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds, which, to the best of our knowledge, is the first result.
- Score: 23.169998435268504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the communication bottleneck in distributed and federated learning
applications, algorithms using communication compression have attracted
significant attention and are widely used in practice. Moreover, the huge
number, high heterogeneity and limited availability of clients result in high
client-variance. This paper addresses these two issues together by proposing
compressed and client-variance reduced methods COFIG and FRECON. We prove an
$O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$
bound on the number of communication rounds of COFIG in the nonconvex setting,
where $N$ is the total number of clients, $S$ is the number of clients
participating in each round, $\epsilon$ is the convergence error, and $\omega$
is the variance parameter associated with the compression operator. In case of
FRECON, we prove an $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ bound on the
number of communication rounds. In the convex setting, COFIG converges within
$O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the
best of our knowledge, is also the first convergence result for compression
schemes that do not communicate with all the clients in each round. We stress
that neither COFIG nor FRECON needs to communicate with all the clients, and
they enjoy the first or faster convergence results for convex and nonconvex
federated learning in the regimes considered. Experimental results point to an
empirical superiority of COFIG and FRECON over existing baselines.
Related papers
- Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis [14.98493572536424]
In federated learning, it is common to assume that clients are always available to participate in training, which may not be feasible with user devices in practice.
Recent analyze federated learning under more realistic participation patterns as cyclic client availability or arbitrary participation.
arXiv Detail & Related papers (2024-10-30T15:41:35Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Variance-reduced accelerated methods for decentralized stochastic
double-regularized nonconvex strongly-concave minimax problems [7.5573375809946395]
We consider a network of $m$ computing agents collaborate via peer-to-peer communications.
Our algorithmic framework introduces agrangian multiplier to eliminate the consensus constraint on the dual variable.
To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general non regularizers applied to both the primal and dual variables.
arXiv Detail & Related papers (2023-07-14T01:32:16Z) - Federated Learning in the Presence of Adversarial Client Unavailability [16.201377650598516]
Federated learning is a decentralized machine learning framework that enables collaborative model without revealing raw data.
Due to the diverse hardware software limitations, a client may not always be available for the computation requests from the server.
In harsh environments like battlefields, adversaries can selectively silence specific clients.
arXiv Detail & Related papers (2023-05-31T15:57:07Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with
Communication Compression [37.20712215269538]
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications.
This paper proposes BEER, which adopts communication with gradient tracking, shows it converges at a faster rate.
arXiv Detail & Related papers (2022-01-31T16:14:09Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - FedPAGE: A Fast Local Stochastic Gradient Method for
Communication-Efficient Federated Learning [21.055643409860743]
Averaging (FedAvg, also known as Local-SGD) is a classical federated learning in which clients run multiple local SGD steps before communicating their update to an orchestrating server.
We propose a new federated learning algorithm, FedAvg, able to reduce the communication complexity by utilizing the recent optimal PAGE method.
arXiv Detail & Related papers (2021-08-10T15:41:27Z) - Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm.
Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z) - Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for
Strongly Convex Decentralized Optimization [69.49313819343992]
We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing.
The proposed VR-EXTRA requires the time of $O(kappa_s+n)logfrac1epsilon)$ gradient evaluations and $O(kappa_b+kappa_c)logfrac1epsilon)$ communication rounds.
The proposed VR-DIGing has a little higher communication cost of $O(kappa_b+kappa
arXiv Detail & Related papers (2020-09-09T15:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.