Related papers: Conditional Automated Channel Pruning for Deep Neural Networks

Conditional Automated Channel Pruning for Deep Neural Networks

URL: http://arxiv.org/abs/2009.09724v2
Date: Sun, 27 Sep 2020 03:29:23 GMT
Title: Conditional Automated Channel Pruning for Deep Neural Networks
Authors: Yixin Liu, Yong Guo, Zichang Liu, Haohua Liu, Jingjie Zhang, Zejun Chen, Jing Liu, Jian Chen
Abstract summary: We develop a conditional model that takes an arbitrary compression rate as input and outputs the corresponding compressed model. In the experiments, the resultant models with different compression rates consistently outperform the models compressed by existing methods.
Score: 22.709646484723876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model compression aims to reduce the redundancy of deep networks to obtain compact models. Recently, channel pruning has become one of the predominant compression methods to deploy deep models on resource-constrained devices. Most channel pruning methods often use a fixed compression rate for all the layers of the model, which, however, may not be optimal. To address this issue, given a target compression rate for the whole model, one can search for the optimal compression rate for each layer. Nevertheless, these methods perform channel pruning for a specific target compression rate. When we consider multiple compression rates, they have to repeat the channel pruning process multiple times, which is very inefficient yet unnecessary. To address this issue, we propose a Conditional Automated Channel Pruning(CACP) method to obtain the compressed models with different compression rates through single channel pruning process. To this end, we develop a conditional model that takes an arbitrary compression rate as input and outputs the corresponding compressed model. In the experiments, the resultant models with different compression rates consistently outperform the models compressed by existing methods with a channel pruning process for each target compression rate.

Related papers

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way. Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z)
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z)
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper. OPQ analytically solves the compression allocation with pre-trained weight parameters only. We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z)
Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Substitutional Neural Image Compression [48.20906717052056]
Substitutional Neural Image Compression (SNIC) is a general approach for enhancing any neural image compression model. It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance.
arXiv Detail & Related papers (2021-05-16T20:53:31Z)
Automated Model Compression by Jointly Applied Pruning and Quantization [14.824593320721407]
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost. We tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it. We propose the automated model compression by jointly applied pruning and quantization (AJPQ)
arXiv Detail & Related papers (2020-11-12T07:06:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.