MinUn: Accurate ML Inference on Microcontrollers
        - URL: http://arxiv.org/abs/2210.16556v1
- Date: Sat, 29 Oct 2022 10:16:12 GMT
- Title: MinUn: Accurate ML Inference on Microcontrollers
- Authors: Shikhar Jaiswal, Rahul Kiran Kranti Goli, Aayan Kumar, Vivek Seshadri
  and Rahul Sharma
- Abstract summary: Running machine learning inference on tiny devices, known as TinyML, is an emerging research area.
We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers.
- Score: 2.2638536653874195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Running machine learning inference on tiny devices, known as TinyML, is an
emerging research area. This task requires generating inference code that uses
memory frugally, a task that standard ML frameworks are ill-suited for. A
deployment framework for TinyML must be a) parametric in the number
representation to take advantage of the emerging representations like posits,
b) carefully assign high-precision to a few tensors so that most tensors can be
kept in low-precision while still maintaining model accuracy, and c) avoid
memory fragmentation. We describe MinUn, the first TinyML framework that
holistically addresses these issues to generate efficient code for ARM
microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the
prior TinyML frameworks.
 
      
        Related papers
        - Small Batch Size Training for Language Models: When Vanilla SGD Works,   and Why Gradient Accumulation Is Wasteful [71.96579951744897]
 Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating accumulation.<n>In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyper parameters to small batch sizes.
 arXiv  Detail & Related papers  (2025-07-09T17:57:36Z)
- MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context   Language Models [72.61076288351201]
 We propose Memory-efficient Offloaded Mini-sequence Inference (MOM)
MOM partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading.
On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU.
 arXiv  Detail & Related papers  (2025-04-16T23:15:09Z)
- MicroFlow: An Efficient Rust-Based Inference Engine for TinyML [1.8902208722501446]
 We present MicroFlow, an open-source framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language.
The proposed framework enables the successful deployment of NNs on highly resource-constrained devices.
 arXiv  Detail & Related papers  (2024-09-28T18:34:27Z)
- Tiny Machine Learning: Progress and Futures [24.76599651516217]
 Tiny Machine Learning (TinyML) is a new frontier of machine learning.
TinyML is challenging due to hardware constraints.
We will first discuss the definition, challenges, and applications of TinyML.
 arXiv  Detail & Related papers  (2024-03-28T00:34:56Z)
- MLonMCU: TinyML Benchmarking with Fast Retargeting [1.4319942396517]
 It is non-trivial to choose the optimal combination of frameworks and targets for a given application.
A tool called MLonMCU is proposed in this paper and demonstrated by benchmarking the state-of-the-art TinyML frameworks TFLite for Microcontrollers and TVM effortlessly.
 arXiv  Detail & Related papers  (2023-06-15T08:44:35Z)
- MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML
  on Microcontrollers [3.1823074562424756]
 We present the MEMA framework for efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems.
We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems.
 arXiv  Detail & Related papers  (2023-04-12T00:27:11Z)
- TinyReptile: TinyML with Federated Meta-Learning [9.618821589196624]
 We propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning.
We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM.
 arXiv  Detail & Related papers  (2023-04-11T13:11:10Z)
- Incremental Online Learning Algorithms Comparison for Gesture and Visual
  Smart Sensors [68.8204255655161]
 This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
 arXiv  Detail & Related papers  (2022-09-01T17:05:20Z)
- TinyML Platforms Benchmarking [0.0]
 Recent advances in ultra-low power embedded devices for machine learning (ML) have permitted a new class of products.
TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices.
Many TinyML frameworks have been developed for different platforms to facilitate the deployment of ML models.
 arXiv  Detail & Related papers  (2021-11-30T15:26:26Z)
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
 We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
 arXiv  Detail & Related papers  (2021-10-28T17:58:45Z)
- A TinyML Platform for On-Device Continual Learning with Quantized Latent
  Replays [66.62377866022221]
 Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle.
We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor.
Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
 arXiv  Detail & Related papers  (2021-10-20T11:01:23Z)
- Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
  Personalized Federated Learning [56.17603785248675]
 Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
 arXiv  Detail & Related papers  (2021-06-09T08:47:58Z)
- TinyTL: Reduce Activations, Not Trainable Parameters for Efficient
  On-Device Learning [78.80707950262214]
 On-device learning enables edge devices to continually adapt the AI models to new data.
Existing work solves this problem by reducing the number of trainable parameters.
We present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning.
 arXiv  Detail & Related papers  (2020-07-22T18:39:53Z)
- MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
 We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine)
TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space.
TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
 arXiv  Detail & Related papers  (2020-07-20T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.