Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling
on Heterogeneous Embedded Platforms
- URL: http://arxiv.org/abs/2105.03596v2
- Date: Tue, 11 May 2021 08:01:36 GMT
- Title: Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling
on Heterogeneous Embedded Platforms
- Authors: Wei Lou, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V. Merrett
- Abstract summary: This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA))
Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x faster for similar ImageNet Top-1 accuracy.
- Score: 3.3197851873862385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mobile and embedded platforms are increasingly required to efficiently
execute computationally demanding DNNs across heterogeneous processing
elements. At runtime, the available hardware resources to DNNs can vary
considerably due to other concurrently running applications. The performance
requirements of the applications could also change under different scenarios.
To achieve the desired performance, dynamic DNNs have been proposed in which
the number of channels/layers can be scaled in real time to meet different
requirements under varying resource constraints. However, the training process
of such dynamic DNNs can be costly, since platform-aware models of different
deployment scenarios must be retrained to become dynamic. This paper proposes
Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware
NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family
of sub-networks from a static OFA backbone model, and contains a runtime
manager to choose different sub-networks under different runtime environments.
As such, Dynamic-OFA does not need the traditional dynamic DNN training
pipeline. Compared to the state-of-the-art, our experimental results using
ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x
(GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU)
higher accuracy at similar latency.
Related papers
- Dynamic DNNs and Runtime Management for Efficient Inference on
Mobile/Embedded Devices [2.8851756275902476]
Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms.
We co-designed novel Dynamic Super-Networks to maximise system-level performance and energy efficiency.
Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency.
arXiv Detail & Related papers (2024-01-17T04:40:30Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - An efficient and flexible inference system for serving heterogeneous
ensembles of deep neural networks [0.0]
Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive.
We propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
arXiv Detail & Related papers (2022-08-30T08:05:43Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Incremental Training and Group Convolution Pruning for Runtime DNN
Performance Scaling on Heterogeneous Embedded Platforms [23.00896228073755]
Inference for Deep Neural Networks is increasingly being executed locally on mobile and embedded platforms.
In this paper, we present a dynamic DNN using incremental training and group convolution pruning.
It achieved 10.6x (energy) and 41.6x (time) wider dynamic range by combining with task mapping and DVFS.
arXiv Detail & Related papers (2021-05-08T05:38:01Z) - A Progressive Sub-Network Searching Framework for Dynamic Inference [33.93841415140311]
We propose a progressive sub-net searching framework, which is embedded with several effective techniques, including trainable noise ranking, channel group and fine-tuning threshold setting, sub-nets re-selection.
Our proposed method achieves much better dynamic inference accuracy compared with prior popular Universally-Slimmable-Network by 4.4%-maximally and 2.3%-averagely in ImageNet dataset with the same model size.
arXiv Detail & Related papers (2020-09-11T22:56:02Z) - Learning Dynamic Routing for Semantic Segmentation [86.56049245100084]
This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing.
The proposed framework generates data-dependent routes, adapting to the scale distribution of each image.
To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly.
arXiv Detail & Related papers (2020-03-23T17:22:14Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.