Related papers: DNN gradient lossless compression: Can GenNorm be the answer?

DNN gradient lossless compression: Can GenNorm be the answer?

URL: http://arxiv.org/abs/2111.07599v1
Date: Mon, 15 Nov 2021 08:33:10 GMT
Title: DNN gradient lossless compression: Can GenNorm be the answer?
Authors: Zhong-Jing Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini
Abstract summary: gradient compression is relevant in many distributed Deep Neural Network (DNN) training scenarios. For some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution.
Score: 17.37160669785566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN training, if the underlying gradient distribution is available, classical lossless compression approaches can be used to reduce the number of bits required for communicating the gradient entries. Mean field analysis has suggested that gradient updates can be considered as independent random variables, while Laplace approximation can be used to argue that gradient has a distribution approximating the normal (Norm) distribution in some regimes. In this paper we argue that, for some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution. We provide numerical evaluations to validate that the hypothesis GenNorm modelling provides a more accurate prediction of the DNN gradient tail distribution. Additionally, this modeling choice provides concrete improvement in terms of lossless compression of the gradients when applying classical fix-to-variable lossless coding algorithms, such as Huffman coding, to the quantized gradient updates. This latter results indeed provides an effective compression strategy with low memory and computational complexity that has great practical relevance in distributed DNN training scenarios.

Related papers

A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks [4.086403209504347]
The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits. We present a numerical scheme that successfully reconstructs input training, real-world, practical data from trainable VQNNs' gradients.
arXiv Detail & Related papers (2025-04-17T10:12:38Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Wyner-Ziv Gradient Compression for Federated Learning [4.619828919345114]
Gradient compression is an effective method to reduce communication load by transmitting compressed gradients. This paper proposes a practical gradient compression scheme for federated learning, which uses historical gradients to compress gradients. We also implement our gradient quantization method on the real dataset, and the performance of our method is better than the previous schemes.
arXiv Detail & Related papers (2021-11-16T07:55:43Z)
A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data. In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem. We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z)
Efficient Distributed Auto-Differentiation [22.192220404846267]
gradient-based algorithms for training large deep neural networks (DNNs) are communication-heavy. We introduce a surprisingly simple statistic for training distributed DNNs that is more communication-friendly than the gradient. The process provides the flexibility of averaging gradients during backpropagation, enabling novel flexible training schemas.
arXiv Detail & Related papers (2021-02-18T21:46:27Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
Quantizing data for distributed learning [24.46948464551684]
We consider machine learning applications that train a model by leveraging data over a network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of updates, but as models become larger, so does the size of the dataset. In paper, we propose that quantizes data instead of over gradient updates and can support learning applications.
arXiv Detail & Related papers (2020-12-14T19:54:41Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.