Reliable Model Compression via Label-Preservation-Aware Loss Functions
- URL: http://arxiv.org/abs/2012.01604v1
- Date: Thu, 3 Dec 2020 00:00:41 GMT
- Title: Reliable Model Compression via Label-Preservation-Aware Loss Functions
- Authors: Vinu Joseph, Shoaib Ahmed Siddiqui, Aditya Bhaskara, Ganesh
Gopalakrishnan, Saurav Muralidharan, Michael Garland, Sheraz Ahmed, Andreas
Dengel
- Abstract summary: We present a framework that uses a teacher-student learning paradigm to better preserve labels.
We obtain a significant reduction of up to 4.1X in the number of mismatches between the compressed and reference models.
- Score: 14.368823297066276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model compression is a ubiquitous tool that brings the power of modern deep
learning to edge devices with power and latency constraints. The goal of model
compression is to take a large reference neural network and output a smaller
and less expensive compressed network that is functionally equivalent to the
reference. Compression typically involves pruning and/or quantization, followed
by re-training to maintain the reference accuracy. However, it has been
observed that compression can lead to a considerable mismatch in the labels
produced by the reference and the compressed models, resulting in bias and
unreliability. To combat this, we present a framework that uses a
teacher-student learning paradigm to better preserve labels. We investigate the
role of additional terms to the loss function and show how to automatically
tune the associated parameters. We demonstrate the effectiveness of our
approach both quantitatively and qualitatively on multiple compression schemes
and accuracy recovery algorithms using a set of 8 different real-world network
architectures. We obtain a significant reduction of up to 4.1X in the number of
mismatches between the compressed and reference models, and up to 5.7X in cases
where the reference model makes the correct prediction.
Related papers
- Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization [5.648270790530862]
State-of-the-art approaches involve lossy model compression mechanisms, which induce a tradeoff between the resulting model quality (accuracy) and compression ratio.
We make a key enabling observation that the sensitivity of model weights to compression varies during training, and different weights benefit from different quantization levels.
We propose a non-uniform quantization scheme that leverages this variation, an efficient search mechanism that dynamically finds the best quantization configurations, and a quantization-aware delta compression mechanism that rearranges weights to minimize checkpoint differences.
arXiv Detail & Related papers (2023-06-20T18:00:31Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Accordion: Adaptive Gradient Communication via Critical Learning Regime
Identification [12.517161466778655]
Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes.
To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, or low-rank updates.
In this work, we show that such performance degradation due to choosing a high compression ratio is not fundamental.
An adaptive compression strategy can reduce communication while maintaining final test accuracy.
arXiv Detail & Related papers (2020-10-29T16:41:44Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.