A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications
- URL: http://arxiv.org/abs/2502.15737v1
- Date: Thu, 06 Feb 2025 17:22:01 GMT
- Title: A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications
- Authors: Lucas Rey, Ana M. Bernardos, Andrzej D. Dobrzycki, David Carramiñana, Luca Bergesio, Juan A. Besada, José Ramón Casar,
- Abstract summary: This study evaluates the deployment of object detection models on resource-constrained edge devices and cloud environments.<n>The NVIDIA Jetson Orin Nano, Orin NX, and Raspberry Pi 5 (RPI5) devices have been tested to measure their detection accuracy, inference speed, and energy consumption.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Advancements in embedded systems and Artificial Intelligence (AI) have enhanced the capabilities of Unmanned Aircraft Vehicles (UAVs) in computer vision. However, the integration of AI techniques o-nboard drones is constrained by their processing capabilities. In this sense, this study evaluates the deployment of object detection models (YOLOv8n and YOLOv8s) on both resource-constrained edge devices and cloud environments. The objective is to carry out a comparative performance analysis using a representative real-time UAV image processing pipeline. Specifically, the NVIDIA Jetson Orin Nano, Orin NX, and Raspberry Pi 5 (RPI5) devices have been tested to measure their detection accuracy, inference speed, and energy consumption, and the effects of post-training quantization (PTQ). The results show that YOLOv8n surpasses YOLOv8s in its inference speed, achieving 52 FPS on the Jetson Orin NX and 65 fps with INT8 quantization. Conversely, the RPI5 failed to satisfy the real-time processing needs in spite of its suitability for low-energy consumption applications. An analysis of both the cloud-based and edge-based end-to-end processing times showed that increased communication latencies hindered real-time applications, revealing trade-offs between edge (low latency) and cloud processing (quick processing). Overall, these findings contribute to providing recommendations and optimization strategies for the deployment of AI models on UAVs.
Related papers
- TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios [1.6861784540485294]
TakuNet is a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem.<n>It achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count.<n>It is suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios.
arXiv Detail & Related papers (2025-01-10T11:32:56Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector [0.0]
This study focuses on the YOLOv9 object detection model, focusing on its architectural innovations, training methodologies, and performance improvements.
Key advancements, such as the Generalized Efficient Layer Aggregation Network GELAN and Programmable Gradient Information PGI, significantly enhance feature extraction and gradient flow.
This paper provides the first in depth exploration of YOLOv9s internal features and their real world applicability, establishing it as a state of the art solution for real time object detection.
arXiv Detail & Related papers (2024-09-12T07:46:58Z) - Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device.
It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z) - Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs [13.037162115493393]
On-sensor hardware acceleration is a promising approach to enable low latency Visual Inertial Odometry (VIO)
This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC)
By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation.
arXiv Detail & Related papers (2024-06-19T08:51:19Z) - SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on
FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications.
We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion.
We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.