Related papers: $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning

$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning

URL: http://arxiv.org/abs/2302.02589v1
Date: Mon, 6 Feb 2023 06:54:49 GMT
Title: $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning
Authors: Zhiwei Tang, Yanmeng Wang, Tsung-Hui Chang
Abstract summary: Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm. FL suffers from high communication cost when training large-scale machine learning models. We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
Score: 14.363110221372274
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic sign-based methods. More importantly, the unified noisy perturbation scheme enables the development of the very first sign-based FedAvg algorithm ($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$-SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.

Related papers

Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees [13.806112971330482]
We propose the first Greedy Low-Rank compression algorithm for distributed learning with rigorous convergence guarantees.<n>We prove that GreedyLore achieves a convergence rate of $mathcalO(sigma/sqrtNT + 1/T)$ under standards such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression.
arXiv Detail & Related papers (2025-07-11T17:46:12Z)
On the Interaction of Noise, Compression Role, and Adaptivity under $(L_0, L_1)$-Smoothness: An SDE-based Approach [20.77655203511758]
We study the dynamics of Distributed SGD, Distributed Compressed SGD, and Distributed SignSGD.<n>Our analysis provides insights into the intricate interactions between batch noise, gradient compression, and adaptivity.
arXiv Detail & Related papers (2025-05-30T19:35:15Z)
Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs [2.218667838700643]
Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. Their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using differential equations.
arXiv Detail & Related papers (2025-02-24T09:39:17Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
FedNMUT -- Federated Noisy Model Update Tracking Convergence Analysis [3.665841843512992]
A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed. It is tailored to function efficiently in the presence noisy communication channels. FedNMUT incorporates noise into its parameters to mimic the conditions of noisy communication channels.
arXiv Detail & Related papers (2024-03-20T02:17:47Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM) Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z)
Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD. The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z)
A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs. The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor. Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z)
Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL) No existing methods enjoy all the aforementioned properties. We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.