HQNAS: Auto CNN deployment framework for joint quantization and
architecture search
- URL: http://arxiv.org/abs/2210.08485v1
- Date: Sun, 16 Oct 2022 08:32:18 GMT
- Title: HQNAS: Auto CNN deployment framework for joint quantization and
architecture search
- Authors: Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin
- Abstract summary: We propose a novel neural network design framework called Hardware-aware Quantized Neural Architecture Search(HQNAS)
It takes only 4 GPU hours to discover an outstanding NN policy on CIFAR10.
It also takes only %10 GPU time to generate a comparable model on Imagenet.
- Score: 30.45926484863791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning applications are being transferred from the cloud to edge with
the rapid development of embedded computing systems. In order to achieve higher
energy efficiency with the limited resource budget, neural networks(NNs) must
be carefully designed in two steps, the architecture design and the
quantization policy choice. Neural Architecture Search(NAS) and Quantization
have been proposed separately when deploying NNs onto embedded devices.
However, taking the two steps individually is time-consuming and leads to a
sub-optimal final deployment. To this end, we propose a novel neural network
design framework called Hardware-aware Quantized Neural Architecture
Search(HQNAS) framework which combines the NAS and Quantization together in a
very efficient manner using weight-sharing and bit-sharing. It takes only 4 GPU
hours to discover an outstanding NN policy on CIFAR10. It also takes only %10
GPU time to generate a comparable model on Imagenet compared to the traditional
NAS method with 1.8x decrease of latency and a negligible accuracy loss of only
0.7%. Besides, our method can be adapted in a lifelong situation where the
neural network needs to evolve occasionally due to changes of local data,
environment and user preference.
Related papers
- RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit
CNNs [53.82853297675979]
1-bit convolutional neural networks (CNNs) with binary weights and activations show their potential for resource-limited embedded devices.
One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS.
We introduce Discrepant Child-Parent Neural Architecture Search (DCP-NAS) to efficiently search 1-bit CNNs.
arXiv Detail & Related papers (2023-06-27T11:28:29Z) - Lightweight Neural Architecture Search for Temporal Convolutional
Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing.
We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs.
We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Evolutionary Neural Architecture Search Supporting Approximate
Multipliers [0.5414308305392761]
We propose a multi-objective NAS method based on Cartesian genetic programming for evolving convolutional neural networks (CNN)
The most suitable approximate multipliers are automatically selected from a library of approximate multipliers.
Evolved CNNs are compared with common human-created CNNs of a similar complexity on the CIFAR-10 benchmark problem.
arXiv Detail & Related papers (2021-01-28T09:26:03Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - NASCaps: A Framework for Neural Architecture Search to Optimize the
Accuracy and Hardware Efficiency of Convolutional Capsule Networks [10.946374356026679]
We propose NASCaps, an automated framework for the hardware-aware NAS of different types of Deep Neural Networks (DNNs)
We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm)
Our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow.
arXiv Detail & Related papers (2020-08-19T14:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.