$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for
Federated Learning
- URL: http://arxiv.org/abs/2302.02589v1
- Date: Mon, 6 Feb 2023 06:54:49 GMT
- Title: $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for
Federated Learning
- Authors: Zhiwei Tang, Yanmeng Wang, Tsung-Hui Chang
- Abstract summary: Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm.
FL suffers from high communication cost when training large-scale machine learning models.
We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
- Score: 14.363110221372274
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Federated Learning (FL) is a promising privacy-preserving distributed
learning paradigm but suffers from high communication cost when training
large-scale machine learning models. Sign-based methods, such as SignSGD
\cite{bernstein2018signsgd}, have been proposed as a biased gradient
compression technique for reducing the communication cost. However, sign-based
algorithms could diverge under heterogeneous data, which thus motivated the
development of advanced techniques, such as the error-feedback method and
stochastic sign-based compression, to fix this issue. Nevertheless, these
methods still suffer from slower convergence rates. Besides, none of them
allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}.
In this paper, we propose a novel noisy perturbation scheme with a general
symmetric noise distribution for sign-based compression, which not only allows
one to flexibly control the tradeoff between gradient bias and convergence
performance, but also provides a unified viewpoint to existing stochastic
sign-based methods. More importantly, the unified noisy perturbation scheme
enables the development of the very first sign-based FedAvg algorithm
($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that
$z$-SignFedAvg achieves a faster convergence rate than existing sign-based
methods and, under the uniformly distributed noise, can enjoy the same
convergence rate as its uncompressed counterpart. Extensive experiments are
conducted to demonstrate that the $z$-SignFedAvg can achieve competitive
empirical performance on real datasets and outperforms existing schemes.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - FedNMUT -- Federated Noisy Model Update Tracking Convergence Analysis [3.665841843512992]
A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed.
It is tailored to function efficiently in the presence noisy communication channels.
FedNMUT incorporates noise into its parameters to mimic the conditions of noisy communication channels.
arXiv Detail & Related papers (2024-03-20T02:17:47Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs.
The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor.
Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.