SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on
Fine-Grained Group Sparsity
- URL: http://arxiv.org/abs/2310.19509v1
- Date: Mon, 30 Oct 2023 13:08:48 GMT
- Title: SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on
Fine-Grained Group Sparsity
- Authors: Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian
Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen
- Abstract summary: We present a novel mobile inference acceleration framework SparseByteNN.
We show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%.
- Score: 10.89385369643021
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To address the challenge of increasing network size, researchers have
developed sparse models through network pruning. However, maintaining model
accuracy while achieving significant speedups on general computing devices
remains an open problem. In this paper, we present a novel mobile inference
acceleration framework SparseByteNN, which leverages fine-grained kernel
sparsity to achieve real-time execution as well as high accuracy. Our framework
consists of two parts: (a) A fine-grained kernel sparsity schema with a
sparsity granularity between structured pruning and unstructured pruning. It
designs multiple sparse patterns for different operators. Combined with our
proposed whole network rearrangement strategy, the schema achieves a high
compression rate and high precision at the same time. (b) Inference engine
co-optimized with the sparse pattern. The conventional wisdom is that this
reduction in theoretical FLOPs does not translate into real-world efficiency
gains. We aim to correct this misconception by introducing a family of
efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient
implementation of sparse primitives, we show that sparse versions of
MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy
curve. Experimental results on Qualcomm 855 show that for 30% sparse
MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and
1.29x speedup over the state-of-the-art sparse inference engine MNN with a
slight accuracy drop of 0.224%. The source code of SparseByteNN will be
available at https://github.com/lswzjuer/SparseByteNN
Related papers
- FSCNN: A Fast Sparse Convolution Neural Network Inference System [31.474696818171953]
Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters.
To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity.
We present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs.
arXiv Detail & Related papers (2022-12-17T06:44:58Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile
Devices based on Fine-Grained Structured Weight Sparsity [46.75304109970339]
This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We propose a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning.
Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference.
arXiv Detail & Related papers (2021-08-25T03:50:46Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - SparseDNN: Fast Sparse Deep Learning Inference on CPUs [1.6244541005112747]
We present SparseDNN, a sparse deep learning inference engine targeting CPUs.
We show that our sparse code generator can achieve significant speedups over state-of-the-art sparse and dense libraries.
arXiv Detail & Related papers (2021-01-20T03:27:35Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.