YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers
- URL: http://arxiv.org/abs/2203.03794v1
- Date: Tue, 8 Mar 2022 01:24:36 GMT
- Title: YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers
- Authors: Young D. Kwon, Jagmohan Chauhan, and Cecilia Mascolo
- Abstract summary: YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
- Score: 10.420617367363047
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the advancement of Deep Neural Networks (DNN) and large amounts of
sensor data from Internet of Things (IoT) systems, the research community has
worked to reduce the computational and resource demands of DNN to compute on
low-resourced microcontrollers (MCUs). However, most of the current work in
embedded deep learning focuses on solving a single task efficiently, while the
multi-tasking nature and applications of IoT devices demand systems that can
handle a diverse range of tasks (activity, voice, and context recognition) with
input from a variety of sensors, simultaneously.
In this paper, we propose YONO, a product quantization (PQ) based approach
that compresses multiple heterogeneous models and enables in-memory model
execution and switching for dissimilar multi-task learning on MCUs. We first
adopt PQ to learn codebooks that store weights of different models. Also, we
propose a novel network optimization and heuristics to maximize the compression
rate and minimize the accuracy loss. Then, we develop an online component of
YONO for efficient model execution and switching between multiple tasks on an
MCU at run time without relying on an external storage device.
YONO shows remarkable performance as it can compress multiple heterogeneous
models with negligible or no loss of accuracy up to 12.37$\times$. Besides,
YONO's online component enables an efficient execution (latency of 16-159 ms
per operation) and reduces model loading/switching latency and energy
consumption by 93.3-94.5% and 93.9-95.0%, respectively, compared to external
storage access. Interestingly, YONO can compress various architectures trained
with datasets that were not shown during YONO's offline codebook learning phase
showing the generalizability of our method. To summarize, YONO shows great
potential and opens further doors to enable multi-task learning systems on
extremely resource-constrained devices.
Related papers
- Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices [17.919425885740793]
We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods.
We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
arXiv Detail & Related papers (2023-10-11T06:09:14Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Energy-efficient Deployment of Deep Learning Applications on Cortex-M
based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers.
We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies.
We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Differentiable Network Pruning for Microcontrollers [14.864940447206871]
We present a differentiable structured network pruning method for convolutional neural networks.
It integrates a model's MCU-specific resource usage and parameter importance feedback to obtain highly compressed yet accurate classification models.
arXiv Detail & Related papers (2021-10-15T20:26:15Z) - NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear
Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers.
Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z) - Dynamic Sparsity Neural Networks for Automatic Speech Recognition [44.352231175123215]
We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time.
Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
arXiv Detail & Related papers (2020-05-16T22:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.