Quantizing data for distributed learning
- URL: http://arxiv.org/abs/2012.07913v2
- Date: Wed, 24 Mar 2021 20:20:04 GMT
- Title: Quantizing data for distributed learning
- Authors: Osama A. Hanna, Yahya H. Ezzeldin, Christina Fragouli, Suhas Diggavi
- Abstract summary: We consider machine learning applications that train a model by leveraging data over a network, where communication constraints can create a performance bottleneck.
A number of recent approaches propose to overcome this bottleneck through compression of updates, but as models become larger, so does the size of the dataset.
In paper, we propose that quantizes data instead of over gradient updates and can support learning applications.
- Score: 24.46948464551684
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We consider machine learning applications that train a model by leveraging
data distributed over a network, where communication constraints can create a
performance bottleneck. A number of recent approaches propose to overcome this
bottleneck through compression of gradient updates. However, as models become
larger, so does the size of the gradient updates. In this paper, we propose an
alternate approach, that quantizes data instead of gradients, and can support
learning over applications where the size of gradient updates is prohibitive.
Our approach combines aspects of: (1) sample selection; (2) dataset
quantization; and (3) gradient compensation. We analyze the convergence of the
proposed approach for smooth convex and non-convex objective functions and show
that we can achieve order optimal convergence rates with communication that
mostly depends on the data rather than the model (gradient) dimension. We use
our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet
datasets, and show that we can achieve an order of magnitude savings over
gradient compression methods.
Related papers
- FLOPS: Forward Learning with OPtimal Sampling [1.694989793927645]
gradient-based computation methods have recently gained focus for learning with only forward passes, also referred to as queries.
Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling.
We propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-08T12:16:12Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - Scaling Knowledge Graph Embedding Models [12.757685697180946]
We propose a new method for scaling training of knowledge graph embedding models for link prediction.
Our scaling solution for GNN-based knowledge graph embedding models achieves a 16x speed up on benchmark datasets.
arXiv Detail & Related papers (2022-01-08T08:34:52Z) - Wyner-Ziv Gradient Compression for Federated Learning [4.619828919345114]
Gradient compression is an effective method to reduce communication load by transmitting compressed gradients.
This paper proposes a practical gradient compression scheme for federated learning, which uses historical gradients to compress gradients.
We also implement our gradient quantization method on the real dataset, and the performance of our method is better than the previous schemes.
arXiv Detail & Related papers (2021-11-16T07:55:43Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.