Related papers: DNA: Differentiable Network-Accelerator Co-Search

DNA: Differentiable Network-Accelerator Co-Search

URL: http://arxiv.org/abs/2010.14778v1
Date: Wed, 28 Oct 2020 05:57:16 GMT
Title: DNA: Differentiable Network-Accelerator Co-Search
Authors: Yongan Zhang, Yonggan Fu, Weiwen Jiang, Chaojian Li, Haoran You, Meng Li, Vikas Chandra, Yingyan Lin
Abstract summary: We propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration. Experiments and ablation studies show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and accelerators.
Score: 36.68587348474986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in providing optimal performance. However, the great potential of such solutions have yet to be unleashed due to the challenge of simultaneously exploring the vast and entangled, yet different design spaces of the networks and their accelerators. To this end, we propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators to maximize both the task accuracy and acceleration efficiency. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators that is applicable to both FPGA- and ASIC-based DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration for more efficient DNNs and their accelerators; and (2) a joint DNN network and accelerator co-search algorithm that enables simultaneously searching for optimal DNN structures and their accelerators' micro-architectures and mapping methods to maximize both the task accuracy and acceleration efficiency. Experiments and ablation studies based on FPGA measurements and ASIC synthesis show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and DNN accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet), while requiring notably reduced search time (up to 1234.3x) over SOTA co-exploration methods, when evaluated over ten SOTA baselines on three datasets. All codes will be released upon acceptance.

Related papers

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference [22.9834921448069]
We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerators on-chip. We show that ODiMO reduces energy/latency by up to 33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual mappings.
arXiv Detail & Related papers (2023-06-08T09:23:46Z)
NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks [24.95135135092478]
We propose a Neural Architecture Search and Acceleration framework dubbed NASA. It enables automated multiplication-reduced DNN development and integrates a dedicated multiplication-reduced accelerator. Experiments consistently validate the advantages of NASA's algorithm-hardware co-design framework in terms of achievable accuracy and efficiency tradeoffs.
arXiv Detail & Related papers (2022-10-24T16:03:42Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency [28.379932311374624]
Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks. We propose G-CoS, a GNN and accelerator co-search framework that can automatically search for matched GNN structures and accelerators. Experiments and ablation studies show that the GNNs generated by G-CoS consistently outperform SOTA GNNs and GNN accelerators in terms of both task accuracy and hardware efficiency.
arXiv Detail & Related papers (2021-09-18T18:36:04Z)
Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators [29.72502711426566]
We propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators. Our framework efficiently localizes the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA generates networks and accelerators consistently outperform state-of-the-art designs.
arXiv Detail & Related papers (2021-06-11T18:54:29Z)
NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization [15.63765190153914]
We present NetAdaptV2 with three innovations to better balance the time spent for each step while supporting non-differentiable search metrics. First, we propose channel-level bypass connections that merge network depth and layer width into a single search dimension. Second, ordered dropout is proposed to train multiple DNNs in a single forward-backward pass to decrease the time for training a super-network.
arXiv Detail & Related papers (2021-03-31T18:03:46Z)
SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation [97.78417228445883]
We present SmartExchange, an algorithm- hardware co-design framework for energy-efficient inference of deep neural networks (DNNs) We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. We further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance.
arXiv Detail & Related papers (2020-05-07T12:12:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.