NPAS: A Compiler-aware Framework of Unified Network Pruning and
Architecture Search for Beyond Real-Time Mobile Acceleration
- URL: http://arxiv.org/abs/2012.00596v2
- Date: Sun, 25 Apr 2021 04:18:34 GMT
- Title: NPAS: A Compiler-aware Framework of Unified Network Pruning and
Architecture Search for Beyond Real-Time Mobile Acceleration
- Authors: Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan
Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan
Yang, Bin Ren, Yanzhi Wang, Xue Lin
- Abstract summary: We propose a compiler automatic code generation framework supporting different DNNs and different pruning schemes.
We also propose NPAS, a compiler-aware unified network pruning, and architecture search.
Our framework achieves 6.7ms, 5.9ms, 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone.
- Score: 48.25487285358816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing demand to efficiently deploy DNNs on mobile edge devices,
it becomes much more important to reduce unnecessary computation and increase
the execution speed. Prior methods towards this goal, including model
compression and network architecture search (NAS), are largely performed
independently and do not fully consider compiler-level optimizations which is a
must-do for mobile acceleration. In this work, we first propose (i) a general
category of fine-grained structured pruning applicable to various DNN layers,
and (ii) a comprehensive, compiler automatic code generation framework
supporting different DNNs and different pruning schemes, which bridge the gap
of model compression and NAS. We further propose NPAS, a compiler-aware unified
network pruning, and architecture search. To deal with large search space, we
propose a meta-modeling procedure based on reinforcement learning with fast
evaluation and Bayesian optimization, ensuring the total number of training
epochs comparable with representative NAS frameworks. Our framework achieves
6.7ms, 5.9ms, 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3
level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an
off-the-shelf mobile phone, consistently outperforming prior work.
Related papers
- RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Lightweight Neural Architecture Search for Temporal Convolutional
Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing.
We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs.
We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.