DNA: Differentiable Network-Accelerator Co-Search
- URL: http://arxiv.org/abs/2010.14778v1
- Date: Wed, 28 Oct 2020 05:57:16 GMT
- Title: DNA: Differentiable Network-Accelerator Co-Search
- Authors: Yongan Zhang, Yonggan Fu, Weiwen Jiang, Chaojian Li, Haoran You, Meng
Li, Vikas Chandra, Yingyan Lin
- Abstract summary: We propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators.
Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration.
Experiments and ablation studies show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and accelerators.
- Score: 36.68587348474986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Powerful yet complex deep neural networks (DNNs) have fueled a booming demand
for efficient DNN solutions to bring DNN-powered intelligence into numerous
applications. Jointly optimizing the networks and their accelerators are
promising in providing optimal performance. However, the great potential of
such solutions have yet to be unleashed due to the challenge of simultaneously
exploring the vast and entangled, yet different design spaces of the networks
and their accelerators. To this end, we propose DNA, a Differentiable
Network-Accelerator co-search framework for automatically searching for matched
networks and accelerators to maximize both the task accuracy and acceleration
efficiency. Specifically, DNA integrates two enablers: (1) a generic design
space for DNN accelerators that is applicable to both FPGA- and ASIC-based DNN
accelerators and compatible with DNN frameworks such as PyTorch to enable
algorithmic exploration for more efficient DNNs and their accelerators; and (2)
a joint DNN network and accelerator co-search algorithm that enables
simultaneously searching for optimal DNN structures and their accelerators'
micro-architectures and mapping methods to maximize both the task accuracy and
acceleration efficiency. Experiments and ablation studies based on FPGA
measurements and ASIC synthesis show that the matched networks and accelerators
generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and DNN
accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet),
while requiring notably reduced search time (up to 1234.3x) over SOTA
co-exploration methods, when evaluated over ten SOTA baselines on three
datasets. All codes will be released upon acceptance.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Precision-aware Latency and Energy Balancing on Multi-Accelerator
Platforms for DNN Inference [22.9834921448069]
We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerators on-chip.
We show that ODiMO reduces energy/latency by up to 33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual mappings.
arXiv Detail & Related papers (2023-06-08T09:23:46Z) - NASA: Neural Architecture Search and Acceleration for Hardware Inspired
Hybrid Networks [24.95135135092478]
We propose a Neural Architecture Search and Acceleration framework dubbed NASA.
It enables automated multiplication-reduced DNN development and integrates a dedicated multiplication-reduced accelerator.
Experiments consistently validate the advantages of NASA's algorithm-hardware co-design framework in terms of achievable accuracy and efficiency tradeoffs.
arXiv Detail & Related papers (2022-10-24T16:03:42Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and
Efficiency [28.379932311374624]
Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks.
We propose G-CoS, a GNN and accelerator co-search framework that can automatically search for matched GNN structures and accelerators.
Experiments and ablation studies show that the GNNs generated by G-CoS consistently outperform SOTA GNNs and GNN accelerators in terms of both task accuracy and hardware efficiency.
arXiv Detail & Related papers (2021-09-18T18:36:04Z) - Auto-NBA: Efficient and Effective Search Over the Joint Space of
Networks, Bitwidths, and Accelerators [29.72502711426566]
We propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators.
Our framework efficiently localizes the optimal design within the huge joint design space for each target dataset and acceleration specification.
Our Auto-NBA generates networks and accelerators consistently outperform state-of-the-art designs.
arXiv Detail & Related papers (2021-06-11T18:54:29Z) - NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network
Training and Architecture Optimization [15.63765190153914]
We present NetAdaptV2 with three innovations to better balance the time spent for each step while supporting non-differentiable search metrics.
First, we propose channel-level bypass connections that merge network depth and layer width into a single search dimension.
Second, ordered dropout is proposed to train multiple DNNs in a single forward-backward pass to decrease the time for training a super-network.
arXiv Detail & Related papers (2021-03-31T18:03:46Z) - SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost
Computation [97.78417228445883]
We present SmartExchange, an algorithm- hardware co-design framework for energy-efficient inference of deep neural networks (DNNs)
We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2.
We further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance.
arXiv Detail & Related papers (2020-05-07T12:12:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.