Joint Channel and Weight Pruning for Model Acceleration on Moblie
Devices
- URL: http://arxiv.org/abs/2110.08013v1
- Date: Fri, 15 Oct 2021 11:18:42 GMT
- Title: Joint Channel and Weight Pruning for Model Acceleration on Moblie
Devices
- Authors: Tianli Zhao, Xi Sheryl Zhang, Wentao Zhu, Jiaxing Wang, Ji Liu, Jian
Cheng
- Abstract summary: pruning is a widely adopted practice to balance the computational resource consumption and the accuracy.
We present a unified framework with Joint Channel pruning and Weight pruning (JCW)
We develop a tailored multi-objective evolutionary algorithm in the JCW framework, which enables one single search to obtain the optimal candidate architectures.
- Score: 37.51092726022731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For practical deep neural network design on mobile devices, it is essential
to consider the constraints incurred by the computational resources and the
inference latency in various applications. Among deep network acceleration
related approaches, pruning is a widely adopted practice to balance the
computational resource consumption and the accuracy, where unimportant
connections can be removed either channel-wisely or randomly with a minimal
impact on model accuracy. The channel pruning instantly results in a
significant latency reduction, while the random weight pruning is more flexible
to balance the latency and accuracy. In this paper, we present a unified
framework with Joint Channel pruning and Weight pruning (JCW), and achieves a
better Pareto-frontier between the latency and accuracy than previous model
compression approaches. To fully optimize the trade-off between the latency and
accuracy, we develop a tailored multi-objective evolutionary algorithm in the
JCW framework, which enables one single search to obtain the optimal candidate
architectures for various deployment requirements. Extensive experiments
demonstrate that the JCW achieves a better trade-off between the latency and
accuracy against various state-of-the-art pruning methods on the ImageNet
classification dataset. Our codes are available at
https://github.com/jcw-anonymous/JCW.
Related papers
- Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices.
Common approaches to address this issue are pruning and mixed-precision quantization.
We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching [13.76996108304056]
This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap.
We use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit.
Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches.
arXiv Detail & Related papers (2021-10-25T09:54:17Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Architecture Aware Latency Constrained Sparse Neural Networks [35.50683537052815]
In this paper, we design an architecture aware latency constrained sparse framework to prune and accelerate CNN models.
We also propose a novel sparse convolution algorithm for efficient computation.
Our system-algorithm co-design framework can achieve much better frontier among network accuracy and latency on resource-constrained mobile devices.
arXiv Detail & Related papers (2021-09-01T03:41:31Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.