Related papers: Communication-Efficient Distributed SGD with Compressed Sensing

Communication-Efficient Distributed SGD with Compressed Sensing

URL: http://arxiv.org/abs/2112.07836v1
Date: Wed, 15 Dec 2021 02:10:45 GMT
Title: Communication-Efficient Distributed SGD with Compressed Sensing
Authors: Yujie Tang, Vikram Ramanathan, Junshan Zhang, Na Li
Abstract summary: We consider large scale distributed optimization over a set of edge devices connected to a central server. Inspired by recent advances in federated learning, we propose a distributed gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden. We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.
Score: 24.33697801661053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization procedure. Inspired by recent advances in federated learning, we propose a distributed stochastic gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden. At the heart of the algorithm is to use compressed sensing techniques for the compression of the local stochastic gradients at the device side; and at the server side, a sparse approximation of the global stochastic gradient is recovered from the noisy aggregated compressed local gradients. We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.

Related papers

Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking [45.147301546565316]
This paper considers the problem of decentralized optimization on compact submanifolds.<n>We propose an algorithm where agents update variables using quantized variables.<n>To the best of our knowledge, this is the first algorithm to achieve an $mathcalO (1/K)$ convergence rate in the presence of quantization.
arXiv Detail & Related papers (2025-06-09T01:57:25Z)
Unified Analysis of Decentralized Gradient Descent: a Contraction Mapping Framework [33.417831716314495]
Decentralized gradient descent (DGD) and diffusion are workhorses in decentralized machine learning. We propose a principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies. The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes.
arXiv Detail & Related papers (2025-03-18T15:36:36Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression [31.161598424963934]
This paper takes a first step to understand the role of a popular strategy in decentralized non communication optimization with compression. We propose two variants of gradient clipping added before or after taking a mini-batch perturbation.
arXiv Detail & Related papers (2023-05-17T02:13:18Z)
On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio. Contrary to existing literature allow arbitrary compression ratios. We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs. The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor. Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems [82.2513703281725]
Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices. We present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems.
arXiv Detail & Related papers (2020-03-18T05:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.