DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic
Quantization
- URL: http://arxiv.org/abs/2306.11800v2
- Date: Sat, 2 Sep 2023 04:08:31 GMT
- Title: DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic
Quantization
- Authors: Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara
Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
- Abstract summary: State-of-the-art approaches involve lossy model compression mechanisms, which induce a tradeoff between the model quality (accuracy) and compression ratio.
We make a key enabling observation that the sensitivity of model weights to compression varies during training, and different weights benefit from different quantization levels.
We propose non-uniform quantization, an efficient search mechanism that dynamically finds the best quantization configurations, and a quantization-aware delta compression mechanism.
- Score: 5.931507399723096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increase in the scale of Deep Learning (DL) training workloads in
terms of compute resources and time consumption, the likelihood of encountering
in-training failures rises substantially, leading to lost work and resource
wastage. Such failures are typically offset by a checkpointing mechanism, which
comes at the cost of storage and network bandwidth overhead. State-of-the-art
approaches involve lossy model compression mechanisms, which induce a tradeoff
between the resulting model quality (accuracy) and compression ratio. Delta
compression is then used to further reduce the overhead by only storing the
difference between consecutive checkpoints. We make a key enabling observation
that the sensitivity of model weights to compression varies during training,
and different weights benefit from different quantization levels (ranging from
retaining full precision to pruning). We propose (1) a non-uniform quantization
scheme that leverages this variation, (2) an efficient search mechanism that
dynamically finds the best quantization configurations, and (3) a
quantization-aware delta compression mechanism that rearranges weights to
minimize checkpoint differences, thereby maximizing compression. We instantiate
these contributions in DynaQuant - a framework for DL workload checkpoint
compression. Our experiments show that DynaQuant consistently achieves a better
tradeoff between accuracy and compression ratios compared to prior works,
enabling a compression ratio up to 39x and withstanding up to 10 restores with
negligible accuracy impact for fault-tolerant training. DynaQuant achieves at
least an order of magnitude reduction in checkpoint storage overhead for
training failure recovery as well as transfer learning use cases without any
loss of accuracy.
Related papers
- Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Retraining-free Model Quantization via One-Shot Weight-Coupling Learning [41.299675080384]
Mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers.
MPQ is typically organized into a searching-retraining two-stage process.
In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression.
arXiv Detail & Related papers (2024-01-03T05:26:57Z) - Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler
Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods.
KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation.
Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z) - Backdoor Attacks Against Deep Image Compression via Adaptive Frequency
Trigger [106.10954454667757]
We present a novel backdoor attack with multiple triggers against learned image compression models.
Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model.
arXiv Detail & Related papers (2023-02-28T15:39:31Z) - Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training [90.76576712433595]
Applying lossy compression on images processed by deep neural networks can lead to significant accuracy degradation.
Inspired by the curriculum learning paradigm, we present a novel training approach called curriculum pre-training (CPT) for crowd counting on compressed images.
arXiv Detail & Related papers (2022-08-15T08:43:21Z) - Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop
All-reduce with Ultimate Compression [17.692238652162203]
We implement a sign-bit compression-based learning synchronization framework, Marsit.
It reduces up to 35% training time while preserving the same accuracy as training without compression.
arXiv Detail & Related papers (2022-04-14T06:54:32Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Reliable Model Compression via Label-Preservation-Aware Loss Functions [14.368823297066276]
We present a framework that uses a teacher-student learning paradigm to better preserve labels.
We obtain a significant reduction of up to 4.1X in the number of mismatches between the compressed and reference models.
arXiv Detail & Related papers (2020-12-03T00:00:41Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.