Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration
- URL: http://arxiv.org/abs/2111.11581v1
- Date: Mon, 22 Nov 2021 23:53:14 GMT
- Title: Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration
- Authors: Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao,
Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang
- Abstract summary: We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
- Score: 71.80326738527734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weight pruning is an effective model compression technique to tackle the
challenges of achieving real-time deep neural network (DNN) inference on mobile
devices. However, prior pruning schemes have limited application scenarios due
to accuracy degradation, difficulty in leveraging hardware acceleration, and/or
restriction on certain types of DNN layers. In this paper, we propose a
general, fine-grained structured pruning scheme and corresponding compiler
optimizations that are applicable to any type of DNN layer while achieving high
accuracy and hardware inference performance. With the flexibility of applying
different pruning schemes to different layers enabled by our compiler
optimizations, we further probe into the new problem of determining the
best-suited pruning scheme considering the different acceleration and accuracy
performance of various pruning schemes. Two pruning scheme mapping methods, one
is search-based and the other is rule-based, are proposed to automatically
derive the best-suited pruning regularity and block size for each layer of any
given DNN. Experimental results demonstrate that our pruning scheme mapping
methods, together with the general fine-grained structured pruning scheme,
outperform the state-of-the-art DNN optimization framework with up to
2.48$\times$ and 1.73$\times$ DNN inference acceleration on CIFAR-10 and
ImageNet dataset without accuracy loss.
Related papers
- Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - A Unified DNN Weight Compression Framework Using Reweighted Optimization
Methods [31.869228048294445]
We propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint.
We also extend our method to an integrated framework for the combination of different DNN compression tasks.
arXiv Detail & Related papers (2020-04-12T02:59:06Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted
Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method.
Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers.
It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.