Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator
- URL: http://arxiv.org/abs/2111.12787v1
- Date: Wed, 24 Nov 2021 20:37:50 GMT
- Title: Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator
- Authors: Hongxiang Fan, Martin Ferianc, Zhiqiang Que, He Li, Shuanglong Liu,
Xinyu Niu, Wayne Luk
- Abstract summary: Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs.
However, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation.
We propose a novel three-phase co-design framework, with the following new features.
Our found network and hardware configuration can achieve 2% 6% higher accuracy, 2x 26x smaller latency and 8.5x higher energy efficiency.
- Score: 3.1431240233552007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in algorithm-hardware co-design for deep neural networks
(DNNs) have demonstrated their potential in automatically designing neural
architectures and hardware designs. Nevertheless, it is still a challenging
optimization problem due to the expensive training cost and the time-consuming
hardware implementation, which makes the exploration on the vast design space
of neural architecture and hardware design intractable. In this paper, we
demonstrate that our proposed approach is capable of locating designs on the
Pareto frontier. This capability is enabled by a novel three-phase co-design
framework, with the following new features: (a) decoupling DNN training from
the design space exploration of hardware architecture and neural architecture,
(b) providing a hardware-friendly neural architecture space by considering
hardware characteristics in constructing the search cells, (c) adopting
Gaussian process to predict accuracy, latency and power consumption to avoid
time-consuming synthesis and place-and-route processes. In comparison with the
manually-designed ResNet101, InceptionV2 and MobileNetV2, we can achieve up to
5% higher accuracy with up to 3x speed up on the ImageNet dataset. Compared
with other state-of-the-art co-design frameworks, our found network and
hardware configuration can achieve 2% ~ 6% higher accuracy, 2x ~ 26x smaller
latency and 8.5x higher energy efficiency.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - ISyNet: Convolutional Neural Networks design for AI accelerator [0.0]
Current state-of-the-art architectures are found with neural architecture search (NAS) taking model complexity into account.
We propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method.
We show the advantage of the designed architectures for the NPU devices on ImageNet and the generalization ability for the downstream classification and detection tasks.
arXiv Detail & Related papers (2021-09-04T20:57:05Z) - Does Form Follow Function? An Empirical Exploration of the Impact of
Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup.
We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z) - RHNAS: Realizable Hardware and Neural Architecture Search [3.5694949627557846]
RHNAS is a method that combines reinforcement learning for hardware optimization with differentiable neural architecture search.
RHNAS discovers realizable NN-HW designs with 1.84x lower latency and 1.86x lower energy-delay product (EDP) on ImageNet and 2.81x lower latency and 3.30x lower on CIFAR-10.
arXiv Detail & Related papers (2021-06-17T00:15:42Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z) - HSCoNAS: Hardware-Software Co-Design of Efficient DNNs via Neural
Architecture Search [6.522258468923919]
We present a novel hardware-aware neural architecture search (NAS) framework, namely HSCoNAS, to automate the design of deep neural networks (DNNs)
To accomplish this goal, we first propose an effective hardware performance modeling method to approximate the runtime latency of DNNs on target hardware.
We also propose two novel techniques, i.e., dynamic channel scaling to maximize the accuracy under the specified latency and progressive space shrinking to refine the search space towards target hardware.
arXiv Detail & Related papers (2021-03-11T12:21:21Z) - Rethinking Co-design of Neural Architectures and Hardware Accelerators [31.342964958282092]
We systematically study the importance and strategies of co-designing neural architectures and hardware accelerators.
Our experiments show that the joint search method consistently outperforms previous platform-aware neural architecture search.
Our method can reduce energy consumption of an edge accelerator by up to 2x under the same accuracy constraint.
arXiv Detail & Related papers (2021-02-17T07:55:58Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.