Related papers: MicroFlow: An Efficient Rust-Based Inference Engine for TinyML

MicroFlow: An Efficient Rust-Based Inference Engine for TinyML

URL: http://arxiv.org/abs/2409.19432v1
Date: Sat, 28 Sep 2024 18:34:27 GMT
Title: MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Authors: Matteo Carnelos, Francesco Pasti, Nicola Bellotto,
Abstract summary: MicroFlow is an open-source framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language. It is able to use less Flash and RAM memory than other state-of-the-art solutions for deploying NN reference models. It can also achieve faster inference compared to existing engines on medium-size NNs, and similar performance on bigger ones.
Score: 1.8902208722501446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: MicroFlow is an open-source TinyML framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language, specifically designed for efficiency and robustness, which is suitable for applications in critical environments. To achieve these objectives, MicroFlow employs a compiler-based inference engine approach, coupled with Rust's memory safety and features. The proposed solution enables the successful deployment of NNs on highly resource-constrained devices, including bare-metal 8-bit microcontrollers with only 2kB of RAM. Furthermore, MicroFlow is able to use less Flash and RAM memory than other state-of-the-art solutions for deploying NN reference models (i.e. wake-word and person detection). It can also achieve faster inference compared to existing engines on medium-size NNs, and similar performance on bigger ones. The experimental results prove the efficiency and suitability of MicroFlow for the deployment of TinyML models in critical environments where resources are particularly limited.

Related papers

Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML. This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model. A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z)
TinyMetaFed: Efficient Federated Meta-Learning for TinyML [8.940139322528829]
We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication.
arXiv Detail & Related papers (2023-07-13T15:39:26Z)
MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers [3.1823074562424756]
We present the MEMA framework for efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems.
arXiv Detail & Related papers (2023-04-12T00:27:11Z)
TinyReptile: TinyML with Federated Meta-Learning [9.618821589196624]
We propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning. We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM.
arXiv Detail & Related papers (2023-04-11T13:11:10Z)
Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation and Convergence [83.58839320635956]
Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner. Recent FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets. This paper addresses how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks.
arXiv Detail & Related papers (2023-03-23T02:42:10Z)
SlimFL: Federated Learning with Superposition Coding over Slimmable Neural Networks [56.68149211499535]
Federated learning (FL) is a key enabler for efficient communication and computing leveraging devices' distributed computing capabilities. This paper proposes a novel learning framework by integrating FL and width-adjustable slimmable neural networks (SNNs) We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2022-03-26T15:06:13Z)
Joint Superposition Coding and Training for Federated Learning over Multi-Width Neural Networks [52.93232352968347]
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) FL preserves data privacy by exchanging the locally trained models of mobile devices. SNNs are however non-trivial, particularly under wireless connections with time-varying channel conditions. We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2021-12-05T11:17:17Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Measuring what Really Matters: Optimizing Neural Networks for TinyML [7.455546102930911]
neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data. This work addresses the challenges of bringing Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M architecture.
arXiv Detail & Related papers (2021-04-21T17:14:06Z)
Neural Network-based Virtual Microphone Estimator [111.79608275698274]
We propose a neural network-based virtual microphone estimator (NN-VME) The NN-VME estimates virtual microphone signals directly in the time domain, by utilizing the precise estimation capability of the recent time-domain neural networks. Experiments on the CHiME-4 corpus show that the proposed NN-VME achieves high virtual microphone estimation performance even for real recordings.
arXiv Detail & Related papers (2021-01-12T06:30:24Z)
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [18.662026553041937]
Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT) TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
arXiv Detail & Related papers (2020-10-21T19:39:39Z)
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems [5.188829601887422]
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. We introduce Lite Micro, an open-source ML inference framework for running deep-learning models on embedded systems.
arXiv Detail & Related papers (2020-10-17T00:44:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.