Single-path Bit Sharing for Automatic Loss-aware Model Compression
- URL: http://arxiv.org/abs/2101.04935v4
- Date: Thu, 4 May 2023 05:19:41 GMT
- Title: Single-path Bit Sharing for Automatic Loss-aware Model Compression
- Authors: Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui
Tan
- Abstract summary: Single-path Bit Sharing (SBS) is able to significantly reduce computational cost while achieving promising performance.
Our SBS compressed MobileNetV2 achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.
- Score: 126.98903867768732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network pruning and quantization are proven to be effective ways for deep
model compression. To obtain a highly compact model, most methods first perform
network pruning and then conduct network quantization based on the pruned
model. However, this strategy may ignore that they would affect each other and
thus performing them separately may lead to sub-optimal performance. To address
this, performing pruning and quantization jointly is essential. Nevertheless,
how to make a trade-off between pruning and quantization is non-trivial.
Moreover, existing compression methods often rely on some pre-defined
compression configurations. Some attempts have been made to search for optimal
configurations, which however may take unbearable optimization cost. To address
the above issues, we devise a simple yet effective method named Single-path Bit
Sharing (SBS). Specifically, we first consider network pruning as a special
case of quantization, which provides a unified view for pruning and
quantization. We then introduce a single-path model to encode all candidate
compression configurations. In this way, the configuration search problem is
transformed into a subset selection problem, which significantly reduces the
number of parameters, computational cost and optimization difficulty. Relying
on the single-path model, we further introduce learnable binary gates to encode
the choice of bitwidth. By jointly training the binary gates in conjunction
with network parameters, the compression configurations of each layer can be
automatically determined. Extensive experiments on both CIFAR-100 and ImageNet
show that SBS is able to significantly reduce computational cost while
achieving promising performance. For example, our SBS compressed MobileNetV2
achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1
accuracy.
Related papers
- Neural Network Compression using Binarization and Few Full-Precision
Weights [7.206962876422061]
Automatic Prune Binarization (APB) is a novel compression technique combining quantization with pruning.
APB enhances the representational capability of binary networks using a few full-precision weights.
APB delivers better accuracy/memory trade-off compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-15T08:52:00Z) - Towards Hardware-Specific Automatic Compression of Neural Networks [0.0]
pruning and quantization are the major approaches to compress neural networks nowadays.
Effective compression policies consider the influence of the specific hardware architecture on the used compression methods.
We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization.
arXiv Detail & Related papers (2022-12-15T13:34:02Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper.
OPQ analytically solves the compression allocation with pre-trained weight parameters only.
We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Automated Model Compression by Jointly Applied Pruning and Quantization [14.824593320721407]
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost.
We tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it.
We propose the automated model compression by jointly applied pruning and quantization (AJPQ)
arXiv Detail & Related papers (2020-11-12T07:06:29Z) - Differentiable Joint Pruning and Quantization for Hardware Efficiency [16.11027058505213]
DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function.
We show that DJPQ significantly reduces the number of Bit-Operations (BOPs) for several networks while maintaining the top-1 accuracy of original floating-point models.
arXiv Detail & Related papers (2020-07-20T20:45:47Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.