Related papers: YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers

YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers

URL: http://arxiv.org/abs/2203.03794v1
Date: Tue, 8 Mar 2022 01:24:36 GMT
Title: YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers
Authors: Young D. Kwon, Jagmohan Chauhan, and Cecilia Mascolo
Abstract summary: YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching. YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
Score: 10.420617367363047
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the advancement of Deep Neural Networks (DNN) and large amounts of sensor data from Internet of Things (IoT) systems, the research community has worked to reduce the computational and resource demands of DNN to compute on low-resourced microcontrollers (MCUs). However, most of the current work in embedded deep learning focuses on solving a single task efficiently, while the multi-tasking nature and applications of IoT devices demand systems that can handle a diverse range of tasks (activity, voice, and context recognition) with input from a variety of sensors, simultaneously. In this paper, we propose YONO, a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching for dissimilar multi-task learning on MCUs. We first adopt PQ to learn codebooks that store weights of different models. Also, we propose a novel network optimization and heuristics to maximize the compression rate and minimize the accuracy loss. Then, we develop an online component of YONO for efficient model execution and switching between multiple tasks on an MCU at run time without relying on an external storage device. YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$\times$. Besides, YONO's online component enables an efficient execution (latency of 16-159 ms per operation) and reduces model loading/switching latency and energy consumption by 93.3-94.5% and 93.9-95.0%, respectively, compared to external storage access. Interestingly, YONO can compress various architectures trained with datasets that were not shown during YONO's offline codebook learning phase showing the generalizability of our method. To summarize, YONO shows great potential and opens further doors to enable multi-task learning systems on extremely resource-constrained devices.

Related papers

Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks. embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption. split computing - where an SNN is partitioned across two devices - is a promising solution. This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices [17.919425885740793]
We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods. We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
arXiv Detail & Related papers (2023-10-11T06:09:14Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers. We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies. We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks. In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework. In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z)
Differentiable Network Pruning for Microcontrollers [14.864940447206871]
We present a differentiable structured network pruning method for convolutional neural networks. It integrates a model's MCU-specific resource usage and parameter importance feedback to obtain highly compressed yet accurate classification models.
arXiv Detail & Related papers (2021-10-15T20:26:15Z)
NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers. Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z)
Dynamic Sparsity Neural Networks for Automatic Speech Recognition [44.352231175123215]
We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time. Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
arXiv Detail & Related papers (2020-05-16T22:08:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.