Related papers: Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI

Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI

URL: http://arxiv.org/abs/2505.15622v1
Date: Wed, 21 May 2025 15:12:14 GMT
Title: Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI
Authors: Pietro Bartoli, Christian Veronesi, Andrea Giudici, David Siorpaes, Diana Trojaniello, Franco Zappa,
Abstract summary: This work introduces an alternative benchmarking methodology that integrates energy and latency measurements.<n>To evaluate our setup, we tested the STM32N6 MCU, which includes a NPU for executing neural networks.<n>Our findings demonstrate that reducing the core voltage and clock frequency improve the efficiency of pre- and post-processing.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of IoT has increased the need for on-edge machine learning, with TinyML emerging as a promising solution for resource-constrained devices such as MCU. However, evaluating their performance remains challenging due to diverse architectures and application scenarios. Current solutions have many non-negligible limitations. This work introduces an alternative benchmarking methodology that integrates energy and latency measurements while distinguishing three execution phases pre-inference, inference, and post-inference. Additionally, the setup ensures that the device operates without being powered by an external measurement unit, while automated testing can be leveraged to enhance statistical significance. To evaluate our setup, we tested the STM32N6 MCU, which includes a NPU for executing neural networks. Two configurations were considered: high-performance and Low-power. The variation of the EDP was analyzed separately for each phase, providing insights into the impact of hardware configurations on energy efficiency. Each model was tested 1000 times to ensure statistically relevant results. Our findings demonstrate that reducing the core voltage and clock frequency improve the efficiency of pre- and post-processing without significantly affecting network execution performance. This approach can also be used for cross-platform comparisons to determine the most efficient inference platform and to quantify how pre- and post-processing overhead varies across different hardware implementations.

Related papers

Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z)
Towards Edge-Based Idle State Detection in Construction Machinery Using Surveillance Cameras [0.0]
Underused construction machinery leads to increased operational costs and project delays.<n>This paper presents the Edge-IMI framework for detecting idle construction machinery.<n>The proposed solution consists of three components: object detection, tracking, and idle state identification.
arXiv Detail & Related papers (2025-06-01T08:43:33Z)
A Multi-Step Comparative Framework for Anomaly Detection in IoT Data Streams [0.9208007322096533]
Internet of Things (IoT) devices have introduced critical security challenges, underscoring the need for accurate anomaly detection.<n>This paper presents a multi-step evaluation framework assessing the combined impact of preprocessing choices on three machine learning algorithms.<n> Experiments on the IoTID20 dataset shows that GBoosting consistently delivers superior accuracy across preprocessing configurations.
arXiv Detail & Related papers (2025-05-22T16:28:22Z)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
State-Aware IoT Scheduling Using Deep Q-Networks and Edge-Based Coordination [3.4260861366674105]
This paper addresses the challenge of energy efficiency management faced by intelligent IoT devices in complex application environments.<n>A novel optimization method is proposed, combining Deep Q-Network (DQN) with an edge collaboration mechanism.<n> Experiments are conducted using real-world IoT data collected from the FastBee platform.
arXiv Detail & Related papers (2025-04-22T04:24:16Z)
USEFUSE: Uniform Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic.<n>An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption.<n>Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent. This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process. Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Comparison of edge computing methods in Internet of Things architectures for efficient estimation of indoor environmental parameters with Machine Learning [0.0]
Two methods are proposed to implement lightweight Machine Learning models that estimate indoor environmental quality (IEQ) parameters. Their implementation is based on centralised and distributed parallel IoT architectures, connected via wireless. The training and testing of ML models is accomplished with experiments focused on small temperature and illuminance datasets.
arXiv Detail & Related papers (2024-02-07T21:15:18Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.