Related papers: MinUn: Accurate ML Inference on Microcontrollers

MinUn: Accurate ML Inference on Microcontrollers

URL: http://arxiv.org/abs/2210.16556v1
Date: Sat, 29 Oct 2022 10:16:12 GMT
Title: MinUn: Accurate ML Inference on Microcontrollers
Authors: Shikhar Jaiswal, Rahul Kiran Kranti Goli, Aayan Kumar, Vivek Seshadri and Rahul Sharma
Abstract summary: Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers.
Score: 2.2638536653874195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for. A deployment framework for TinyML must be a) parametric in the number representation to take advantage of the emerging representations like posits, b) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy, and c) avoid memory fragmentation. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the prior TinyML frameworks.

Related papers

MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models [72.61076288351201]
We propose Memory-efficient Offloaded Mini-sequence Inference (MOM) MOM partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading. On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU.
arXiv Detail & Related papers (2025-04-16T23:15:09Z)
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML [1.8902208722501446]
We present MicroFlow, an open-source framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language. The proposed framework enables the successful deployment of NNs on highly resource-constrained devices.
arXiv Detail & Related papers (2024-09-28T18:34:27Z)
Tiny Machine Learning: Progress and Futures [24.76599651516217]
Tiny Machine Learning (TinyML) is a new frontier of machine learning. TinyML is challenging due to hardware constraints. We will first discuss the definition, challenges, and applications of TinyML.
arXiv Detail & Related papers (2024-03-28T00:34:56Z)
MLonMCU: TinyML Benchmarking with Fast Retargeting [1.4319942396517]
It is non-trivial to choose the optimal combination of frameworks and targets for a given application. A tool called MLonMCU is proposed in this paper and demonstrated by benchmarking the state-of-the-art TinyML frameworks TFLite for Microcontrollers and TVM effortlessly.
arXiv Detail & Related papers (2023-06-15T08:44:35Z)
MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers [3.1823074562424756]
We present the MEMA framework for efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems.
arXiv Detail & Related papers (2023-04-12T00:27:11Z)
TinyReptile: TinyML with Federated Meta-Learning [9.618821589196624]
We propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning. We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM.
arXiv Detail & Related papers (2023-04-11T13:11:10Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z)
TinyML Platforms Benchmarking [0.0]
Recent advances in ultra-low power embedded devices for machine learning (ML) have permitted a new class of products. TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices. Many TinyML frameworks have been developed for different platforms to facilitate the deployment of ML models.
arXiv Detail & Related papers (2021-11-30T15:26:26Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area. Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration. This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z)
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning [78.80707950262214]
On-device learning enables edge devices to continually adapt the AI models to new data. Existing work solves this problem by reducing the number of trainable parameters. We present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning.
arXiv Detail & Related papers (2020-07-22T18:39:53Z)
MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine) TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
arXiv Detail & Related papers (2020-07-20T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.