MLonMCU: TinyML Benchmarking with Fast Retargeting
- URL: http://arxiv.org/abs/2306.08951v1
- Date: Thu, 15 Jun 2023 08:44:35 GMT
- Title: MLonMCU: TinyML Benchmarking with Fast Retargeting
- Authors: Philipp van Kempen, Rafael Stahl, Daniel Mueller-Gritschneder, Ulf
Schlichtmann
- Abstract summary: It is non-trivial to choose the optimal combination of frameworks and targets for a given application.
A tool called MLonMCU is proposed in this paper and demonstrated by benchmarking the state-of-the-art TinyML frameworks TFLite for Microcontrollers and TVM effortlessly.
- Score: 1.4319942396517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While there exist many ways to deploy machine learning models on
microcontrollers, it is non-trivial to choose the optimal combination of
frameworks and targets for a given application. Thus, automating the end-to-end
benchmarking flow is of high relevance nowadays. A tool called MLonMCU is
proposed in this paper and demonstrated by benchmarking the state-of-the-art
TinyML frameworks TFLite for Microcontrollers and TVM effortlessly with a large
number of configurations in a low amount of time.
Related papers
- Large Multimodal Models as General In-Context Classifiers [73.11242790834383]
In this work, we argue that this answer overlooks an important capability of LMMs: in-context learning.<n>We benchmark state-of-the-art LMMs on diverse datasets for closed-world classification and find that, although their zero-shot performance is lower than CLIP's, LMMs with a few in-context examples can match or even surpass contrastive VLMs with cache-based adapters.<n>We extend this analysis to the open-world setting, where the generative nature of LMMs makes them more suitable for the task.
arXiv Detail & Related papers (2026-02-26T17:08:18Z) - MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents [53.44122827359892]
We propose MemCtrl, a framework that uses Multimodal Large Language Models (MLLMs) for pruning memory online.<n>-augmented MLLMs show an improvement of around 16% on average, with over 20% on specific instruction subsets.
arXiv Detail & Related papers (2026-01-28T18:31:17Z) - On The Dynamic Ensemble Selection for TinyML-based Systems -- a Preliminary Study [0.9553819152637493]
Recent progress in TinyML technologies triggers the need to address the challenge of balancing inference time and classification quality.<n>This study examines a DES-Clustering approach for a multi-class computer vision task within TinyML systems.<n> Experiments have shown that a larger pool of classifiers for dynamic selection improves classification accuracy, and thus leads to an increase in average inference time on the TinyML device.
arXiv Detail & Related papers (2025-09-22T18:35:35Z) - Real-Time Performance Benchmarking of TinyML Models in Embedded Systems (PICO: Performance of Inference, CPU, and Operations) [5.637804042390397]
PICO-TINYML-BENCHMARK is a framework for benchmarking the real-time performance of TinyML models on resource-constrained embedded systems.<n>We benchmark three representative TinyML models on two widely adopted platforms, BeagleBone AI64 and Raspberry Pi 4.<n>Results reveal critical trade-offs: the BeagleBone AI64 demonstrates consistent inference latency for AI-specific tasks, while the Raspberry Pi 4 excels in resource efficiency and cost-effectiveness.
arXiv Detail & Related papers (2025-09-05T00:30:39Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z) - AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [70.95645743670062]
AtomThink is a framework for constructing long chains of thought (CoT) in a step-by-step manner, guiding MLLMs to perform complex reasoning.
AtomMATH is a large-scale multimodal dataset of long CoTs, and an atomic capability evaluation metric for mathematical tasks.
AtomThink significantly improves the performance of baseline MLLMs, achieving approximately 50% relative accuracy gains on MathVista and 120% on MathVerse.
arXiv Detail & Related papers (2024-11-18T11:54:58Z) - Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.
Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML
on Microcontrollers [3.1823074562424756]
We present the MEMA framework for efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems.
We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems.
arXiv Detail & Related papers (2023-04-12T00:27:11Z) - MinUn: Accurate ML Inference on Microcontrollers [2.2638536653874195]
Running machine learning inference on tiny devices, known as TinyML, is an emerging research area.
We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers.
arXiv Detail & Related papers (2022-10-29T10:16:12Z) - TinyML Platforms Benchmarking [0.0]
Recent advances in ultra-low power embedded devices for machine learning (ML) have permitted a new class of products.
TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices.
Many TinyML frameworks have been developed for different platforms to facilitate the deployment of ML models.
arXiv Detail & Related papers (2021-11-30T15:26:26Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - MicroNets: Neural Network Architectures for Deploying TinyML
Applications on Commodity Microcontrollers [18.662026553041937]
Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT)
TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget.
neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
arXiv Detail & Related papers (2020-10-21T19:39:39Z) - Benchmarking TinyML Systems: Challenges and Direction [10.193715318589812]
We present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads.
Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.
arXiv Detail & Related papers (2020-03-10T15:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.