Smoothness-Aware Quantization Techniques
- URL: http://arxiv.org/abs/2106.03524v1
- Date: Mon, 7 Jun 2021 11:30:05 GMT
- Title: Smoothness-Aware Quantization Techniques
- Authors: Bokun Wang, Mher Safaryan, Peter Richt\'arik
- Abstract summary: We show that block quantization with $n$ blocks outperforms single block quantization.
We also show that our smoothness-aware quantization strategies outperform existing quantization schemes.
- Score: 0.2578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed machine learning has become an indispensable tool for training
large supervised machine learning models. To address the high communication
costs of distributed training, which is further exacerbated by the fact that
modern highly performing models are typically overparameterized, a large body
of work has been devoted in recent years to the design of various compression
strategies, such as sparsification and quantization, and optimization
algorithms capable of using them. Recently, Safaryan et al (2021) pioneered a
dramatically different compression design approach: they first use the local
training data to form local {\em smoothness matrices}, and then propose to
design a compressor capable of exploiting the smoothness information contained
therein. While this novel approach leads to substantial savings in
communication, it is limited to sparsification as it crucially depends on the
linearity of the compression operator. In this work, we resolve this problem by
extending their smoothness-aware compression strategy to arbitrary unbiased
compression operators, which also includes sparsification. Specializing our
results to quantization, we observe significant savings in communication
complexity compared to standard quantization. In particular, we show
theoretically that block quantization with $n$ blocks outperforms single block
quantization, leading to a reduction in communication complexity by an
$\mathcal{O}(n)$ factor, where $n$ is the number of nodes in the distributed
system. Finally, we provide extensive numerical evidence that our
smoothness-aware quantization strategies outperform existing quantization
schemes as well the aforementioned smoothness-aware sparsification strategies
with respect to all relevant success measures: the number of iterations, the
total amount of bits communicated, and wall-clock time.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - Optimal Brain Compression: A Framework for Accurate Post-Training
Quantization and Pruning [29.284147465251685]
We introduce a new compression framework which covers both weight pruning and quantization in a unified setting.
We show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods.
arXiv Detail & Related papers (2022-08-24T14:33:35Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Smoothness Matrices Beat Smoothness Constants: Better Communication
Compression Techniques for Distributed Optimization [10.592277756185046]
Large scale distributed optimization has become the default tool for the training of supervised machine learning models.
We propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.
arXiv Detail & Related papers (2021-02-14T20:55:02Z) - End-to-end Learning of Compressible Features [35.40108701875527]
Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators.
CNNs are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks.
Unfortunately, the generated features are high dimensional and expensive to store.
We propose a learned method that jointly optimize for compressibility along with the task objective.
arXiv Detail & Related papers (2020-07-23T05:17:33Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.