Related papers: Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices

Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices

URL: http://arxiv.org/abs/2409.16808v1
Date: Wed, 25 Sep 2024 10:56:49 GMT
Title: Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices
Authors: Daghash K. Alqahtani, Aamir Cheema, Adel N. Toosi,
Abstract summary: We evaluate state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet) We deployed these models on popular edge devices like the Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson Orin Nano, collecting key performance metrics such as energy consumption, inference time, and Mean Average Precision (mAP) Our findings highlight that lower mAP models such as SSD MobileNet V1 are more energy-efficient and faster in
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern applications, such as autonomous vehicles, require deploying deep learning algorithms on resource-constrained edge devices for real-time image and video processing. However, there is limited understanding of the efficiency and performance of various object detection models on these devices. In this paper, we evaluate state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet). We deployed these models on popular edge devices like the Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson Orin Nano, collecting key performance metrics such as energy consumption, inference time, and Mean Average Precision (mAP). Our findings highlight that lower mAP models such as SSD MobileNet V1 are more energy-efficient and faster in inference, whereas higher mAP models like YOLOv8 Medium generally consume more energy and have slower inference, though with exceptions when accelerators like TPUs are used. Among the edge devices, Jetson Orin Nano stands out as the fastest and most energy-efficient option for request handling, despite having the highest idle energy consumption. These results emphasize the need to balance accuracy, speed, and energy efficiency when deploying deep learning models on edge devices, offering valuable guidance for practitioners and researchers selecting models and devices for their applications.

Related papers

On the Sustainability of AI Inferences in the Edge [3.71486243189764]
Edge devices perform AI inferences to support latency-critical applications.<n>There is no study on their performance and energy usage for informed decision-making.<n>We analyze trade-offs among model F1 score, inference time, inference power, and memory usage.
arXiv Detail & Related papers (2025-07-30T20:47:22Z)
EMOv2: Pushing 5M Vision Model Frontier [92.21687467702972]
We set up the new frontier of the 5M magnitude lightweight model on various downstream tasks. Our work rethinks the lightweight infrastructure of efficient IRB and practical components in Transformer. Considering the imperceptible latency for mobile users when downloading models under 4G/5G bandwidth, we investigate the performance upper limit of lightweight models with a magnitude of 5M.
arXiv Detail & Related papers (2024-12-09T17:12:22Z)
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z)
LeYOLO, New Embedded Architecture for Object Detection [0.0]
We introduce two key contributions to object detection models using MSCOCO as a base validation set.<n>First, we propose LeNeck, a general-purpose detection framework that maintains inference speed comparable to SSDLite.<n>Second, we present LeYOLO, an efficient object detection model designed to enhance computational efficiency in YOLO-based architectures.
arXiv Detail & Related papers (2024-06-20T12:08:24Z)
MELTing point: Mobile Evaluation of Language Transformers [8.238355633015068]
We explore the current state of mobile execution of Large Language Models (LLMs) We have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance.
arXiv Detail & Related papers (2024-03-19T15:51:21Z)
SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z)
EPAM: A Predictive Energy Model for Mobile AI [6.451060076703027]
We introduce a comprehensive study of mobile AI applications considering different deep neural network (DNN) models and processing sources. We measure the latency, energy consumption, and memory usage of all the models using four processing sources. Our study highlights important insights, such as how mobile AI behaves in different applications (vision and non-vision) using CPU, GPU, and NNAPI.
arXiv Detail & Related papers (2023-03-02T09:11:23Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z)
EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System [8.16411986220709]
Energy harvesting technology that harvests energy from ambient environment is a promising alternative to batteries for powering those devices. This paper proposes EVE, an automated machine learning framework to search for desired multi-models with shared weights for energy harvesting IoT devices. Experimental results show that the neural networks models generated by EVE is on average 2.5X faster than the baseline models without pruning and shared weights.
arXiv Detail & Related papers (2022-07-14T20:53:46Z)
ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. We build a unified framework for efficient end-to-end temporal action detection (ETAD) ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z)
MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z)
Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices. Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices. Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators. We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.