Related papers: HADAS: Hardware-Aware Dynamic Neural Architecture Search for Edge Performance Scaling

HADAS: Hardware-Aware Dynamic Neural Architecture Search for Edge Performance Scaling

URL: http://arxiv.org/abs/2212.03354v1
Date: Tue, 6 Dec 2022 22:27:00 GMT
Title: HADAS: Hardware-Aware Dynamic Neural Architecture Search for Edge Performance Scaling
Authors: Halima Bouzidi, Mohanad Odema, Hamza Ouarnoughi, Mohammad Abdullah Al Faruque, Smail Niar
Abstract summary: Dynamic neural networks (DyNNs) have become viable techniques to enable intelligence on resource-constrained edge devices. In many cases, the implementation of DyNNs can be sub-optimal due to its underlying backbone architecture being developed at the design stage. We present HADAS, a novel Hardware-Aware Dynamic Neural Architecture Search framework that realizes DyNN architectures whose backbone, early exiting features, and DVFS settings have been jointly optimized.
Score: 8.29394286023338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic neural networks (DyNNs) have become viable techniques to enable intelligence on resource-constrained edge devices while maintaining computational efficiency. In many cases, the implementation of DyNNs can be sub-optimal due to its underlying backbone architecture being developed at the design stage independent of both: (i) the dynamic computing features, e.g. early exiting, and (ii) the resource efficiency features of the underlying hardware, e.g., dynamic voltage and frequency scaling (DVFS). Addressing this, we present HADAS, a novel Hardware-Aware Dynamic Neural Architecture Search framework that realizes DyNN architectures whose backbone, early exiting features, and DVFS settings have been jointly optimized to maximize performance and resource efficiency. Our experiments using the CIFAR-100 dataset and a diverse set of edge computing platforms have seen HADAS dynamic models achieve up to 57% energy efficiency gains compared to the conventional dynamic ones while maintaining the desired level of accuracy scores. Our code is available at https://github.com/HalimaBouzidi/HADAS

Related papers

Model-free front-to-end training of a large high performance laser neural network [0.0]
We demonstrate a fully autonomous and parallel optical neural network (ONN) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources.
arXiv Detail & Related papers (2025-03-21T08:43:02Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Combining Neural Architecture Search and Automatic Code Optimization: A Survey [0.8796261172196743]
Two notable techniques are Hardware-aware Neural Architecture Search (HW-NAS) and Automatic Code Optimization (ACO) HW-NAS automatically designs accurate yet hardware-friendly neural networks, while ACO involves searching for the best compiler optimizations to apply on neural networks. This survey explores recent works that combine these two techniques within a single framework.
arXiv Detail & Related papers (2024-08-07T22:40:05Z)
DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks [6.628409795264665]
We present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2. DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets. To improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
arXiv Detail & Related papers (2023-09-26T04:48:50Z)
Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z)
Neural Architecture Optimization with Graph VAE [21.126140965779534]
We propose an efficient NAS approach to optimize network architectures in a continuous space. The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder.
arXiv Detail & Related papers (2020-06-18T07:05:48Z)
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference [22.267489467486467]
This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms.
arXiv Detail & Related papers (2020-06-04T17:39:58Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.