Related papers: Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs

Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs

URL: http://arxiv.org/abs/2512.00086v1
Date: Wed, 26 Nov 2025 09:46:09 GMT
Title: Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs
Authors: Davide Nadalini, Manuele Rusci, Elia Cereda, Luca Benini, Francesco Conti, Daniele Palossi,
Abstract summary: Monocular depth estimation (MDE) plays a crucial role in enabling spatially-aware applications in Ultra-low-power (ULP) Internet-of-Things (IoT) platforms.<n>We present a multi-modal On-Device Learning (ODL) technique, deployed on an IoT device integrating a 80 mW monocular camera and a 8 x 8 pixel depth sensor.<n>Our in-field tests demonstrate, for the first time, that ODL for MDE can be performed in 17.8 minutes on the IoT node, reducing the root mean squared error from 4.9 to 0.6m with only 3 k self-labeled
Score: 20.500632071926223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular depth estimation (MDE) plays a crucial role in enabling spatially-aware applications in Ultra-low-power (ULP) Internet-of-Things (IoT) platforms. However, the limited number of parameters of Deep Neural Networks for the MDE task, designed for IoT nodes, results in severe accuracy drops when the sensor data observed in the field shifts significantly from the training dataset. To address this domain shift problem, we present a multi-modal On-Device Learning (ODL) technique, deployed on an IoT device integrating a Greenwaves GAP9 MicroController Unit (MCU), a 80 mW monocular camera and a 8 x 8 pixel depth sensor, consuming $\approx$300mW. In its normal operation, this setup feeds a tiny 107 k-parameter $μ$PyD-Net model with monocular images for inference. The depth sensor, usually deactivated to minimize energy consumption, is only activated alongside the camera to collect pseudo-labels when the system is placed in a new environment. Then, the fine-tuning task is performed entirely on the MCU, using the new data. To optimize our backpropagation-based on-device training, we introduce a novel memory-driven sparse update scheme, which minimizes the fine-tuning memory to 1.2 MB, 2.2x less than a full update, while preserving accuracy (i.e., only 2% and 1.5% drops on the KITTI and NYUv2 datasets). Our in-field tests demonstrate, for the first time, that ODL for MDE can be performed in 17.8 minutes on the IoT node, reducing the root mean squared error from 4.9 to 0.6m with only 3 k self-labeled samples, collected in a real-life deployment scenario.

Related papers

NanoCockpit: Performance-optimized Application Framework for AI-based Autonomous Nanorobotics [50.594459728605734]
Small form factor, i.e., a few 10s grams, severely limits onboard computational resources to sub-SI100milliwatt microcontroller units (MCUs)<n>Our framework achieves ideal end-to-end latency, i.e. zero overhead due to serialized tasks, delivering quantifiable improvements in closed-loop control performance.
arXiv Detail & Related papers (2026-01-12T12:29:38Z)
End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers [0.18472148461613155]
Keywords spotting (KWS) is a key enabling technology for hands-free interaction in embedded and IoT devices, where stringent memory and energy constraints challenge the deployment of AI-enabeld devices.<n>In this work, we evaluate and compare several state-of-the-art lightweight neural network architectures, including DS-CNN, LiCoNet, and TENet, alongside our proposed Typman-KWS architecture built upon MobileNet, specifically designed for efficient KWS on microcontroller units (MCUs)<n>Our results show that TKWS with three residual blocks achieves up to 92.4% F1-score with only 14.4k parameters
arXiv Detail & Related papers (2025-09-08T16:01:55Z)
Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers [1.3124513975412255]
We present a practical deep learning (DL) approach for energy-efficient traffic classification on resource-limited microcontrollers.<n>We develop a lightweight 1D-CNN, optimized via hardware-aware neural architecture search (HW-NAS), which achieves 96.59% accuracy on the ISCX VPN-Non-VPN dataset.<n>We evaluate real-world inference performance on two microcontrollers.
arXiv Detail & Related papers (2025-06-12T16:10:22Z)
DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units [1.4447019135112429]
This paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular FOMO network. The proposed tiling enables object detection on low-power MCUs with no compromise on accuracy compared to large-scale detection models.
arXiv Detail & Related papers (2024-10-22T07:37:47Z)
Enhancing Lightweight Neural Networks for Small Object Detection in IoT Applications [1.6932009464531739]
The paper proposes a novel adaptive tiling method that can be used on top of any existing object detector. Our experimental results show that the proposed tiling method can boost the F1-score by up to 225% while reducing the average object count error by up to 76%.
arXiv Detail & Related papers (2023-11-13T08:58:34Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
PhiNets: a scalable backbone for low-power AI at the edge [2.7910505923792646]
We present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory. We demonstrate our approach on a prototype node based on a STM32H743 microcontroller.
arXiv Detail & Related papers (2021-10-01T12:03:25Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection [99.94079901071163]
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs) We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm. Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%.
arXiv Detail & Related papers (2020-04-03T17:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.