HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference
- URL: http://arxiv.org/abs/2104.12766v1
- Date: Mon, 26 Apr 2021 17:59:29 GMT
- Title: HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference
- Authors: Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K.H. So,
Kurt Keutzer
- Abstract summary: We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
- Score: 25.265181492143107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic algorithm-hardware co-design for DNN has shown great success in
improving the performance of DNNs on FPGAs. However, this process remains
challenging due to the intractable search space of neural network architectures
and hardware accelerator implementation. Differing from existing hardware-aware
neural architecture search (NAS) algorithms that rely solely on the expensive
learning-based approaches, our work incorporates integer programming into the
search algorithm to prune the design space. Given a set of hardware resource
constraints, our integer programming formulation directly outputs the optimal
accelerator configuration for mapping a DNN subgraph that minimizes latency. We
use an accuracy predictor for different DNN subgraphs with different
quantization schemes and generate accuracy-latency pareto frontiers. With low
computational cost, our algorithm can generate quantized networks that achieve
state-of-the-art accuracy and hardware performance on Xilinx Zynq (ZU3EG) FPGA
for image classification on ImageNet dataset. The solution searched by our
algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is
60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
Related papers
- HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices.
The inference speed of the design is evaluated over different resource constrained FPGA devices.
We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Automated Design Space Exploration for optimised Deployment of DNN on
Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN)
There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution.
We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z) - EDD: Efficient Differentiable DNN Architecture and Implementation
Co-search for Embedded AI Solutions [40.32848001349242]
We propose a fully simultaneous, efficient differentiable DNN architecture and implementation co-search (EDD) methodology.
We formulate the co-search problem by fusing search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.
arXiv Detail & Related papers (2020-05-06T02:37:48Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.