Related papers: Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

URL: http://arxiv.org/abs/2004.11250v1
Date: Wed, 22 Apr 2020 03:18:23 GMT
Title: Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization
Authors: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren
Abstract summary: High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. constrained computation and storage resources on these devices pose significant challenges for real-time inference executions. We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
Score: 56.3111706960878
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution.

Related papers

Optimizing Multi-DNN Inference on Mobile Devices through Heterogeneous Processor Co-Execution [39.033040759452504]
Deep Neural Networks (DNNs) are increasingly deployed across diverse industries, driving demand for mobile device support. Existing mobile inference frameworks often rely on a single processor per model, limiting hardware utilization and causing suboptimal performance and energy efficiency. We propose an Advanced Multi-DNN Model Scheduling (ADMS) strategy for optimizing multi-DNN inference on mobile heterogeneous processors.
arXiv Detail & Related papers (2025-03-27T03:03:09Z)
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget [18.63754969602021]
Deep neural networks (DNNs) on edge artificial intelligence (AI) devices enable various autonomous mobile computing applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference. We develop SwapNet, an efficient block swapping ecosystem for edge AI devices.
arXiv Detail & Related papers (2024-01-30T05:29:49Z)
Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand Edge Resource [25.274288063300844]
Deep Neural Networks (DNNs) have significantly improved the accuracy of intelligent applications on mobile devices. DNN surgery can enable real-time inference despite the computational limitations of mobile devices. This paper introduces a novel Decentralized DNN Surgery (DDS) framework.
arXiv Detail & Related papers (2023-06-21T11:32:28Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity [46.75304109970339]
This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We propose a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference.
arXiv Detail & Related papers (2021-08-25T03:50:46Z)
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges. We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z)
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.