Related papers: Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

URL: http://arxiv.org/abs/2101.00732v1
Date: Mon, 4 Jan 2021 00:48:09 GMT
Title: Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search
Authors: Muhtadyuzzaman Syed and Arvind Akpuram Srinivasan
Abstract summary: We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space. We provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to ProxylessNAS.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Architecture Search (NAS) has enabled the possibility of automated machine learning by streamlining the manual development of deep neural network architectures defining a search space, search strategy, and performance estimation strategy. To solve the need for multi-platform deployment of Convolutional Neural Network (CNN) models, Once-For-All (OFA) proposed to decouple Training and Search to deliver a one-shot model of sub-networks that are constrained to various accuracy-latency tradeoffs. We find that the performance estimation strategy for OFA's search severely lacks generalizability of different hardware deployment platforms due to single hardware latency lookup tables that require significant amount of time and manual effort to build beforehand. In this work, we demonstrate the framework for building latency predictors for neural network architectures to address the need for heterogeneous hardware support and reduce the overhead of lookup tables altogether. We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space, and GPU-generalization which trains a model on GPU hardware parameters such as Number of Cores, RAM Size, and Memory Bandwidth. With this, we provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to with ProxylessNAS. We also show that the use of these latency predictors match the NAS performance of the lookup table baseline approach if not exceeding it in certain cases.

Related papers

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Structural Pruning of Pre-trained Language Models via Neural Architecture Search [7.833790713816726]
Pre-trained language models (PLM) mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:57Z)
A General-Purpose Transferable Predictor for Neural Architecture Search [22.883809911265445]
We propose a general-purpose neural predictor for Neural Architecture Search (NAS) that can transfer across search spaces. Experimental results on NAS-Bench-101, 201 and 301 demonstrate the efficacy of our scheme.
arXiv Detail & Related papers (2023-02-21T17:28:05Z)
Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)
U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference. We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization. We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z)
Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space. Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
BRP-NAS: Prediction-based NAS using GCNs [21.765796576990137]
BRP-NAS is an efficient hardware-aware NAS enabled by an accurate performance predictor-based on graph convolutional network (GCN) We show that our proposed method outperforms all prior methods on NAS-Bench-101 and NAS-Bench-201. We also release LatBench -- a latency dataset of NAS-Bench-201 models running on a broad range of devices.
arXiv Detail & Related papers (2020-07-16T21:58:43Z)
FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking. We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints. FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.