Related papers: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

URL: http://arxiv.org/abs/2106.06575v2
Date: Mon, 24 Apr 2023 05:21:19 GMT
Title: Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators
Authors: Yonggan Fu, Yongan Zhang, Yang Zhang, David Cox, Yingyan Lin
Abstract summary: We propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators. Our framework efficiently localizes the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA generates networks and accelerators consistently outperform state-of-the-art designs.
Score: 29.72502711426566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i.e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search. To tackle these daunting challenges towards optimal and fast development of DNN accelerators, we propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators, by efficiently localizing the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA integrates a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and a novel joint-search pipeline equipped with a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.

Related papers

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators [57.145175475579315]
This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search.
arXiv Detail & Related papers (2023-12-15T00:22:55Z)
MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems [27.490645446510033]
We propose a novel mapping framework that can perform computation-aware accelerator selection and apply communication-aware sharding strategies to maximize parallelism. We show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.
arXiv Detail & Related papers (2023-07-23T05:50:37Z)
Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections. We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks. In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z)
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators [4.055002321981825]
We present a HW-SW co-design ecosystem for spatial accelerators called Union. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. We demonstrate the value of Union for the community with several case studies.
arXiv Detail & Related papers (2021-09-15T16:42:18Z)
Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics [117.4281417428145]
This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS) NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. We present a unified framework to understand and accelerate NAS, by disentangling "TEG" characteristics of searched networks.
arXiv Detail & Related papers (2021-08-26T17:52:07Z)
AutoSpace: Neural Architecture Search with Less Human Interference [84.42680793945007]
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction. We propose a novel differentiable evolutionary framework named AutoSpace, which evolves the search space to an optimal one. With the learned search space, the performance of recent NAS algorithms can be improved significantly compared with using previously manually designed spaces.
arXiv Detail & Related papers (2021-03-22T13:28:56Z)
DNA: Differentiable Network-Accelerator Co-Search [36.68587348474986]
We propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration. Experiments and ablation studies show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and accelerators.
arXiv Detail & Related papers (2020-10-28T05:57:16Z)
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices [42.07369847938341]
High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs) and their hardware accelerators. To improve the overall solution quality as well as to boost the design productivity, efficient algorithm and accelerator co-design methodologies are indispensable. This paper emphasizes the importance and efficacy of algorithm-accelerator co-design and calls for more research breakthroughs in this interesting and demanding area.
arXiv Detail & Related papers (2020-10-14T15:59:10Z)
DANCE: Differentiable Accelerator/Network Co-Exploration [8.540518473228078]
This work presents a differentiable approach towards the co-exploration of the hardware accelerator and network architecture design. By modeling the hardware evaluation software with a neural network, the relation between the accelerator architecture and the hardware metrics becomes differentiable. Compared to the naive existing approaches, our method performs co-exploration in a significantly shorter time, while achieving superior accuracy and hardware cost metrics.
arXiv Detail & Related papers (2020-09-14T07:43:27Z)
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search [102.67142711824748]
CATCH is a novel Context-bAsed meTa reinforcement learning algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified.
arXiv Detail & Related papers (2020-07-18T09:35:53Z)
Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation [54.28963233377946]
We propose a joint search framework, called AutoRTNet, to automate the design of segmentation strategies. Specifically, we propose hyper-cells to jointly decide the network depth and downsampling strategy, and an aggregation cell to achieve automatic multi-scale feature aggregation.
arXiv Detail & Related papers (2020-03-31T14:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.