Conditional Automated Channel Pruning for Deep Neural Networks
- URL: http://arxiv.org/abs/2009.09724v2
- Date: Sun, 27 Sep 2020 03:29:23 GMT
- Title: Conditional Automated Channel Pruning for Deep Neural Networks
- Authors: Yixin Liu, Yong Guo, Zichang Liu, Haohua Liu, Jingjie Zhang, Zejun
Chen, Jing Liu, Jian Chen
- Abstract summary: We develop a conditional model that takes an arbitrary compression rate as input and outputs the corresponding compressed model.
In the experiments, the resultant models with different compression rates consistently outperform the models compressed by existing methods.
- Score: 22.709646484723876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model compression aims to reduce the redundancy of deep networks to obtain
compact models. Recently, channel pruning has become one of the predominant
compression methods to deploy deep models on resource-constrained devices. Most
channel pruning methods often use a fixed compression rate for all the layers
of the model, which, however, may not be optimal. To address this issue, given
a target compression rate for the whole model, one can search for the optimal
compression rate for each layer. Nevertheless, these methods perform channel
pruning for a specific target compression rate. When we consider multiple
compression rates, they have to repeat the channel pruning process multiple
times, which is very inefficient yet unnecessary. To address this issue, we
propose a Conditional Automated Channel Pruning(CACP) method to obtain the
compressed models with different compression rates through single channel
pruning process. To this end, we develop a conditional model that takes an
arbitrary compression rate as input and outputs the corresponding compressed
model. In the experiments, the resultant models with different compression
rates consistently outperform the models compressed by existing methods with a
channel pruning process for each target compression rate.
Related papers
- Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z) - Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way.
Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper.
OPQ analytically solves the compression allocation with pre-trained weight parameters only.
We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Substitutional Neural Image Compression [48.20906717052056]
Substitutional Neural Image Compression (SNIC) is a general approach for enhancing any neural image compression model.
It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance.
arXiv Detail & Related papers (2021-05-16T20:53:31Z) - Automated Model Compression by Jointly Applied Pruning and Quantization [14.824593320721407]
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost.
We tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it.
We propose the automated model compression by jointly applied pruning and quantization (AJPQ)
arXiv Detail & Related papers (2020-11-12T07:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.