Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning
and Compiler Optimization
- URL: http://arxiv.org/abs/2004.11250v1
- Date: Wed, 22 Apr 2020 03:18:23 GMT
- Title: Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning
and Compiler Optimization
- Authors: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren
- Abstract summary: High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.
constrained computation and storage resources on these devices pose significant challenges for real-time inference executions.
We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
- Score: 56.3111706960878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-end mobile platforms rapidly serve as primary computing devices for a
wide range of Deep Neural Network (DNN) applications. However, the constrained
computation and storage resources on these devices still pose significant
challenges for real-time DNN inference executions. To address this problem, we
propose a set of hardware-friendly structured model pruning and compiler
optimization techniques to accelerate DNN executions on mobile devices. This
demo shows that these optimizations can enable real-time mobile execution of
multiple DNN applications, including style transfer, DNN coloring and super
resolution.
Related papers
- SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond
the Memory Budget [18.63754969602021]
Deep neural networks (DNNs) on edge artificial intelligence (AI) devices enable various autonomous mobile computing applications.
Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference.
We develop SwapNet, an efficient block swapping ecosystem for edge AI devices.
arXiv Detail & Related papers (2024-01-30T05:29:49Z) - Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand
Edge Resource [25.274288063300844]
Deep Neural Networks (DNNs) have significantly improved the accuracy of intelligent applications on mobile devices.
DNN surgery can enable real-time inference despite the computational limitations of mobile devices.
This paper introduces a novel Decentralized DNN Surgery (DDS) framework.
arXiv Detail & Related papers (2023-06-21T11:32:28Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile
Devices based on Fine-Grained Structured Weight Sparsity [46.75304109970339]
This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We propose a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning.
Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference.
arXiv Detail & Related papers (2021-08-25T03:50:46Z) - DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges.
We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.