InstantNet: Automated Generation and Deployment of Instantaneously
Switchable-Precision Networks
- URL: http://arxiv.org/abs/2104.10853v1
- Date: Thu, 22 Apr 2021 04:07:43 GMT
- Title: InstantNet: Automated Generation and Deployment of Instantaneously
Switchable-Precision Networks
- Authors: Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yifan Jiang, Chaojian Li,
Yongyuan Liang, Mingchao Jiang, Zhangyang Wang, Yingyan Lin
- Abstract summary: We propose InstantNet to automatically generate and deploy instantaneously switchable-precision networks which operate at variable bit-widths.
In experiments, the proposed InstantNet consistently outperforms state-of-the-art designs.
- Score: 65.78061366594106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The promise of Deep Neural Network (DNN) powered Internet of Thing (IoT)
devices has motivated a tremendous demand for automated solutions to enable
fast development and deployment of efficient (1) DNNs equipped with
instantaneous accuracy-efficiency trade-off capability to accommodate the
time-varying resources at IoT devices and (2) dataflows to optimize DNNs'
execution efficiency on different devices. Therefore, we propose InstantNet to
automatically generate and deploy instantaneously switchable-precision networks
which operate at variable bit-widths. Extensive experiments show that the
proposed InstantNet consistently outperforms state-of-the-art designs.
Related papers
- Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Efficient Sparsely Activated Transformers [0.34410212782758054]
Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains.
Recent work has explored the integration of dynamic behavior into these networks in the form of mixture-of-expert layers.
We introduce a novel system named PLANER that takes an existing Transformer-based network and a user-defined latency target.
arXiv Detail & Related papers (2022-08-31T00:44:27Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - 2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both
Adversarial Robustness and Efficiency [26.920864182619844]
We propose a 2-in-1 Accelerator aiming at winning both the adversarial robustness and efficiency of DNN accelerators.
Specifically, we first propose a Random Precision Switch (RPS) algorithm that can effectively defend DNNs against adversarial attacks.
Furthermore, we propose a new precision-scalable accelerator featuring (1) a new precision-scalable unit architecture.
arXiv Detail & Related papers (2021-09-11T08:51:01Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design
for Real-time Facial Expression Recognition [75.74756992992147]
This study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy.
Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy.
We demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g. $>25$ FPS and $>70$ FPS at 15W and 30W, respectively) and high energy efficiency.
arXiv Detail & Related papers (2020-06-29T00:48:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.