SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search
- URL: http://arxiv.org/abs/2408.15395v1
- Date: Tue, 27 Aug 2024 20:39:09 GMT
- Title: SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search
- Authors: Hung-Yueh Chiang, Diana Marculescu,
- Abstract summary: We propose a unified NAS framework that searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices.
SCAN-Edge relies on a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process.
- Score: 10.48978386515933
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures. However, unifying NAS for a wide range of edge devices presents challenges due to the variety of hardware designs, supported operations, and compilation optimizations. Existing methods often fix the search space of architecture choices (e.g., activation, convolution, or self-attention) and estimate latency using hardware-agnostic proxies (e.g., FLOPs), which fail to achieve proclaimed latency across various edge devices. To address this issue, we propose SCAN-Edge, a unified NAS framework that jointly searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices, including CPU-, GPU-, and hardware accelerator-based systems. To handle the large search space, SCAN-Edge relies on with a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process. Experiments on large-scale datasets demonstrate that our hybrid networks match the actual MobileNetV2 latency for 224x224 input resolution on various commodity edge devices.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics.
Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - Searching for Efficient Neural Architectures for On-Device ML on Edge
TPUs [10.680700357879601]
Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by on-device ML accelerators.
Existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms.
We provide a two-pronged approach to this challenge: (i) a neural architecture that decouples model cost evaluation, search space design, and the algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants.
arXiv Detail & Related papers (2022-04-09T00:35:19Z) - EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture
Search [32.23992012207146]
We propose End-to-end Hardware-aware DNAS (EH-DNAS) to deliver hardware-efficient deep neural networks on various platforms.
EH-DNAS improves the hardware performance by an average of $1.4times$ on customized accelerators and $1.6times$ on existing hardware processors.
arXiv Detail & Related papers (2021-11-24T06:45:30Z) - One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search [21.50120377137633]
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis.
To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial.
We propose an efficient proxy adaptation technique to significantly boost the latency monotonicity.
arXiv Detail & Related papers (2021-11-01T18:56:42Z) - Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture
and Pruning Search [64.80878113422824]
We propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement.
With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality.
arXiv Detail & Related papers (2021-08-18T06:47:31Z) - HSCoNAS: Hardware-Software Co-Design of Efficient DNNs via Neural
Architecture Search [6.522258468923919]
We present a novel hardware-aware neural architecture search (NAS) framework, namely HSCoNAS, to automate the design of deep neural networks (DNNs)
To accomplish this goal, we first propose an effective hardware performance modeling method to approximate the runtime latency of DNNs on target hardware.
We also propose two novel techniques, i.e., dynamic channel scaling to maximize the accuracy under the specified latency and progressive space shrinking to refine the search space towards target hardware.
arXiv Detail & Related papers (2021-03-11T12:21:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.