Rethinking Co-design of Neural Architectures and Hardware Accelerators
- URL: http://arxiv.org/abs/2102.08619v1
- Date: Wed, 17 Feb 2021 07:55:58 GMT
- Title: Rethinking Co-design of Neural Architectures and Hardware Accelerators
- Authors: Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng,
Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon
- Abstract summary: We systematically study the importance and strategies of co-designing neural architectures and hardware accelerators.
Our experiments show that the joint search method consistently outperforms previous platform-aware neural architecture search.
Our method can reduce energy consumption of an edge accelerator by up to 2x under the same accuracy constraint.
- Score: 31.342964958282092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural architectures and hardware accelerators have been two driving forces
for the progress in deep learning. Previous works typically attempt to optimize
hardware given a fixed model architecture or model architecture given fixed
hardware. And the dominant hardware architecture explored in this prior work is
FPGAs. In our work, we target the optimization of hardware and software
configurations on an industry-standard edge accelerator. We systematically
study the importance and strategies of co-designing neural architectures and
hardware accelerators. We make three observations: 1) the software search space
has to be customized to fully leverage the targeted hardware architecture, 2)
the search for the model architecture and hardware architecture should be done
jointly to achieve the best of both worlds, and 3) different use cases lead to
very different search outcomes. Our experiments show that the joint search
method consistently outperforms previous platform-aware neural architecture
search, manually crafted models, and the state-of-the-art EfficientNet on all
latency targets by around 1% on ImageNet top-1 accuracy. Our method can reduce
energy consumption of an edge accelerator by up to 2x under the same accuracy
constraint, when co-adapting the model architecture and hardware accelerator
configurations.
Related papers
- Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics.
Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z) - Hardware Aware Evolutionary Neural Architecture Search using
Representation Similarity Metric [12.52012450501367]
Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware.
evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources.
We propose an efficient hardware-aware evolution-based NAS approach called HW-EvRSNAS.
arXiv Detail & Related papers (2023-11-07T11:58:40Z) - Network Graph Based Neural Architecture Search [57.78724765340237]
We search neural network by rewiring the corresponding graph and predict the architecture performance by graph properties.
Because we do not perform machine learning over the entire graph space, the searching process is remarkably efficient.
arXiv Detail & Related papers (2021-12-15T00:12:03Z) - Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator [3.1431240233552007]
Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs.
However, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation.
We propose a novel three-phase co-design framework, with the following new features.
Our found network and hardware configuration can achieve 2% 6% higher accuracy, 2x 26x smaller latency and 8.5x higher energy efficiency.
arXiv Detail & Related papers (2021-11-24T20:37:50Z) - ISyNet: Convolutional Neural Networks design for AI accelerator [0.0]
Current state-of-the-art architectures are found with neural architecture search (NAS) taking model complexity into account.
We propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method.
We show the advantage of the designed architectures for the NPU devices on ImageNet and the generalization ability for the downstream classification and detection tasks.
arXiv Detail & Related papers (2021-09-04T20:57:05Z) - Does Form Follow Function? An Empirical Exploration of the Impact of
Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup.
We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z) - Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets [93.79297053429447]
Existing methods often perform an independent architecture search for each target budget.
We propose a general architecture generator that automatically produces effective architectures for an arbitrary budget merely via model inference.
Extensive experiments on three platforms (i.e., mobile, CPU, and GPU) show the superiority of the proposed method over existing NAS methods.
arXiv Detail & Related papers (2021-02-27T13:59:17Z) - Hardware-Centric AutoML for Mixed-Precision Quantization [34.39845532939529]
Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way.
In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy.
Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization.
arXiv Detail & Related papers (2020-08-11T17:30:22Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.