DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units
- URL: http://arxiv.org/abs/2410.16769v1
- Date: Tue, 22 Oct 2024 07:37:47 GMT
- Title: DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units
- Authors: Liam Boyle, Julian Moosmann, Nicolas Baumann, Seonyeong Heo, Michele Magno,
- Abstract summary: This paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular FOMO network.
The proposed tiling enables object detection on low-power MCUs with no compromise on accuracy compared to large-scale detection models.
- Score: 1.4447019135112429
- License:
- Abstract: Advances in lightweight neural networks have revolutionized computer vision in a broad range of IoT applications, encompassing remote monitoring and process automation. However, the detection of small objects, which is crucial for many of these applications, remains an underexplored area in current computer vision research, particularly for low-power embedded devices that host resource-constrained processors. To address said gap, this paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular FOMO network. The proposed tiling enables object detection on low-power MCUs with no compromise on accuracy compared to large-scale detection models. The benefit of the proposed method is demonstrated by applying it to FOMO and TinyissimoYOLO networks on a novel RISC-V-based MCU with built-in ML accelerators. Extensive experimental results show that the proposed tiling method boosts the F1-score by up to 225% for both FOMO and TinyissimoYOLO networks while reducing the average object count error by up to 76% with FOMO and up to 89% for TinyissimoYOLO. Furthermore, the findings of this work indicate that using a soft F1 loss over the popular binary cross-entropy loss can serve as an implicit non-maximum suppression for the FOMO network. To evaluate the real-world performance, the networks are deployed on the RISC-V based GAP9 microcontroller from GreenWaves Technologies, showcasing the proposed method's ability to strike a balance between detection performance ($58% - 95%$ F1 score), low latency (0.6 ms/Inference - 16.2 ms/Inference}), and energy efficiency (31 uJ/Inference} - 1.27 mJ/Inference) while performing multiple predictions using high-resolution images on a MCU.
Related papers
- Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - FedMSE: Federated learning for IoT network intrusion detection [0.0]
The rise of IoT has expanded the cyber attack surface, making traditional centralized machine learning methods insufficient due to concerns about data availability, computational resources, transfer costs, and especially privacy preservation.
A semi-supervised federated learning model was developed to overcome these issues, combining the Shrink Autoencoder and Centroid one-class classifier (SAE-CEN)
This approach enhances the performance of intrusion detection by effectively representing normal network data and accurately identifying anomalies in the decentralized strategy.
arXiv Detail & Related papers (2024-10-18T02:23:57Z) - Accelerating TinyML Inference on Microcontrollers through Approximate Kernels [3.566060656925169]
In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on microcontrollers.
Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our solutions can feature on average 21% latency reduction.
arXiv Detail & Related papers (2024-09-25T11:10:33Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Enhancing Lightweight Neural Networks for Small Object Detection in IoT
Applications [1.6932009464531739]
The paper proposes a novel adaptive tiling method that can be used on top of any existing object detector.
Our experimental results show that the proposed tiling method can boost the F1-score by up to 225% while reducing the average object count error by up to 76%.
arXiv Detail & Related papers (2023-11-13T08:58:34Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z) - PhiNets: a scalable backbone for low-power AI at the edge [2.7910505923792646]
We present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms.
PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory.
We demonstrate our approach on a prototype node based on a STM32H743 microcontroller.
arXiv Detail & Related papers (2021-10-01T12:03:25Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.