Searching for Efficient Neural Architectures for On-Device ML on Edge
TPUs
- URL: http://arxiv.org/abs/2204.14007v1
- Date: Sat, 9 Apr 2022 00:35:19 GMT
- Title: Searching for Efficient Neural Architectures for On-Device ML on Edge
TPUs
- Authors: Berkin Akin, Suyog Gupta, Yun Long, Anton Spiridonov, Zhuo Wang, Marie
White, Hao Xu, Ping Zhou, Yanqi Zhou
- Abstract summary: Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by on-device ML accelerators.
Existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms.
We provide a two-pronged approach to this challenge: (i) a neural architecture that decouples model cost evaluation, search space design, and the algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants.
- Score: 10.680700357879601
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: On-device ML accelerators are becoming a standard in modern mobile
system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for
efficiently utilizing the high compute throughput offered by these
accelerators. However, existing NAS frameworks have several practical
limitations in scaling to multiple tasks and different target platforms. In
this work, we provide a two-pronged approach to this challenge: (i) a
NAS-enabling infrastructure that decouples model cost evaluation, search space
design, and the NAS algorithm to rapidly target various on-device ML tasks, and
(ii) search spaces crafted from group convolution based inverted bottleneck
(IBN) variants that provide flexible quality/performance trade-offs on ML
accelerators, complementing the existing full and depthwise convolution based
IBNs. Using this approach we target a state-of-the-art mobile platform, Google
Tensor SoC, and demonstrate neural architectures that improve the
quality-performance pareto frontier for various computer vision
(classification, detection, segmentation) as well as natural language
processing tasks.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Multi-Objective Neural Architecture Search for In-Memory Computing [0.5892638927736115]
We employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing architectures.
Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets.
arXiv Detail & Related papers (2024-06-10T19:17:09Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics.
Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z) - DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks [6.628409795264665]
We present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2.
DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets.
To improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
arXiv Detail & Related papers (2023-09-26T04:48:50Z) - Search-time Efficient Device Constraints-Aware Neural Architecture
Search [6.527454079441765]
Deep learning techniques like computer vision and natural language processing can be computationally expensive and memory-intensive.
We automate the construction of task-specific deep learning architectures optimized for device constraints through Neural Architecture Search (NAS)
We present DCA-NAS, a principled method of fast neural network architecture search that incorporates edge-device constraints.
arXiv Detail & Related papers (2023-07-10T09:52:28Z) - NAS-FCOS: Efficient Search for Object Detection Architectures [113.47766862146389]
We propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) and the prediction head of a simple anchor-free object detector.
With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs.
arXiv Detail & Related papers (2021-10-24T12:20:04Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - NASCaps: A Framework for Neural Architecture Search to Optimize the
Accuracy and Hardware Efficiency of Convolutional Capsule Networks [10.946374356026679]
We propose NASCaps, an automated framework for the hardware-aware NAS of different types of Deep Neural Networks (DNNs)
We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm)
Our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow.
arXiv Detail & Related papers (2020-08-19T14:29:36Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.