Human Activity Recognition on Microcontrollers with Quantized and
Adaptive Deep Neural Networks
- URL: http://arxiv.org/abs/2209.00839v1
- Date: Fri, 2 Sep 2022 06:32:11 GMT
- Title: Human Activity Recognition on Microcontrollers with Quantized and
Adaptive Deep Neural Networks
- Authors: Francesco Daghero, Alessio Burrello, Chen Xie, Marco Castellano, Luca
Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier
Pagliari
- Abstract summary: Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices.
Most embedded HAR systems are based on simple and not-so-accurate classic machine learning algorithms.
This work proposes a set of efficient one-dimensional Convolutional Neural Networks (CNNs) deployable on general purpose microcontrollers (MCUs)
- Score: 10.195581493173643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human Activity Recognition (HAR) based on inertial data is an increasingly
diffused task on embedded devices, from smartphones to ultra low-power sensors.
Due to the high computational complexity of deep learning models, most embedded
HAR systems are based on simple and not-so-accurate classic machine learning
algorithms. This work bridges the gap between on-device HAR and deep learning,
proposing a set of efficient one-dimensional Convolutional Neural Networks
(CNNs) deployable on general purpose microcontrollers (MCUs). Our CNNs are
obtained combining hyper-parameters optimization with sub-byte and
mixed-precision quantization, to find good trade-offs between classification
results and memory occupation. Moreover, we also leverage adaptive inference as
an orthogonal optimization to tune the inference complexity at runtime based on
the processed input, hence producing a more flexible HAR system. With
experiments on four datasets, and targeting an ultra-low-power RISC-V MCU, we
show that (i) We are able to obtain a rich set of Pareto-optimal CNNs for HAR,
spanning more than 1 order of magnitude in terms of memory, latency and energy
consumption; (ii) Thanks to adaptive inference, we can derive >20 runtime
operating modes starting from a single CNN, differing by up to 10% in
classification scores and by more than 3x in inference complexity, with a
limited memory overhead; (iii) on three of the four benchmarks, we outperform
all previous deep learning methods, reducing the memory occupation by more than
100x. The few methods that obtain better performance (both shallow and deep)
are not compatible with MCU deployment. (iv) All our CNNs are compatible with
real-time on-device HAR with an inference latency <16ms. Their memory
occupation varies in 0.05-23.17 kB, and their energy consumption in 0.005 and
61.59 uJ, allowing years of continuous operation on a small battery supply.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time [5.05866540830123]
We present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of Deep Neural Networks (DNNs) among various on-chip CUs.
We show that ODiMO reduces the latency of a DNN executed on the Darkside by up to 8x at iso-accuracy, compared to a manual mappings.
When targeting energy, ODiMO produced up to 50.8x more efficient mappings, with minimal accuracy drop.
arXiv Detail & Related papers (2024-09-27T09:10:44Z) - Accelerating TinyML Inference on Microcontrollers through Approximate Kernels [3.566060656925169]
In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on microcontrollers.
Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our solutions can feature on average 21% latency reduction.
arXiv Detail & Related papers (2024-09-25T11:10:33Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Energy-efficient Deployment of Deep Learning Applications on Cortex-M
based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers.
We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies.
We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z) - YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Hybrid In-memory Computing Architecture for the Training of Deep Neural
Networks [5.050213408539571]
We propose a hybrid in-memory computing architecture for the training of deep neural networks (DNNs) on hardware accelerators.
We show that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy.
Our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM.
arXiv Detail & Related papers (2021-02-10T05:26:27Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.