FLASH: Fast Neural Architecture Search with Hardware Optimization
- URL: http://arxiv.org/abs/2108.00568v1
- Date: Sun, 1 Aug 2021 23:46:48 GMT
- Title: FLASH: Fast Neural Architecture Search with Hardware Optimization
- Authors: Guihong Li, Sumit K. Mandal, Umit Y. Ogras, Radu Marculescu
- Abstract summary: Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
- Score: 7.263481020106725
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Neural architecture search (NAS) is a promising technique to design efficient
and high-performance deep neural networks (DNNs). As the performance
requirements of ML applications grow continuously, the hardware accelerators
start playing a central role in DNN design. This trend makes NAS even more
complicated and time-consuming for most real applications. This paper proposes
FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and
performance on a real hardware platform. As the main theoretical contribution,
we first propose the NN-Degree, an analytical metric to quantify the
topological characteristics of DNNs with skip connections (e.g., DenseNets,
ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us
to do training-free NAS within one second and build an accuracy predictor by
training as few as 25 samples out of a vast search space with more than 63
billion configurations. Second, by performing inference on the target hardware,
we fine-tune and validate our analytical models to estimate the latency, area,
and energy consumption of various DNN architectures while executing standard ML
datasets. Third, we construct a hierarchical algorithm based on simplicial
homology global optimization (SHGO) to optimize the model-architecture
co-design process, while considering the area, latency, and energy consumption
of the target hardware. We demonstrate that, compared to the state-of-the-art
NAS approaches, our proposed hierarchical SHGO-based algorithm enables more
than four orders of magnitude speedup (specifically, the execution time of the
proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations
show that FLASH is easily transferable to different hardware architectures,
thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3
seconds.
Related papers
- RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG.
Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them.
We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS.
When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z) - Lightweight Neural Architecture Search for Temporal Convolutional
Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing.
We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs.
We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - NASCaps: A Framework for Neural Architecture Search to Optimize the
Accuracy and Hardware Efficiency of Convolutional Capsule Networks [10.946374356026679]
We propose NASCaps, an automated framework for the hardware-aware NAS of different types of Deep Neural Networks (DNNs)
We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm)
Our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow.
arXiv Detail & Related papers (2020-08-19T14:29:36Z) - BRP-NAS: Prediction-based NAS using GCNs [21.765796576990137]
BRP-NAS is an efficient hardware-aware NAS enabled by an accurate performance predictor-based on graph convolutional network (GCN)
We show that our proposed method outperforms all prior methods on NAS-Bench-101 and NAS-Bench-201.
We also release LatBench -- a latency dataset of NAS-Bench-201 models running on a broad range of devices.
arXiv Detail & Related papers (2020-07-16T21:58:43Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.