Benchmarking Edge Computing Devices for Grape Bunches and Trunks
Detection using Accelerated Object Detection Single Shot MultiBox Deep
Learning Models
- URL: http://arxiv.org/abs/2211.11647v1
- Date: Mon, 21 Nov 2022 17:02:33 GMT
- Title: Benchmarking Edge Computing Devices for Grape Bunches and Trunks
Detection using Accelerated Object Detection Single Shot MultiBox Deep
Learning Models
- Authors: Sandro Costa Magalh\~aes and Filipe Neves Santos and Pedro Machado and
Ant\'onio Paulo Moreira and Jorge Dias
- Abstract summary: This work benchmarks the performance of different platforms for object detection in real-time.
Authors used the RetinaNet ResNet-50 fine-tuned using the natural Vine dataset.
- Score: 2.1922186455344796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: Visual perception enables robots to perceive the environment. Visual
data is processed using computer vision algorithms that are usually
time-expensive and require powerful devices to process the visual data in
real-time, which is unfeasible for open-field robots with limited energy. This
work benchmarks the performance of different heterogeneous platforms for object
detection in real-time. This research benchmarks three architectures: embedded
GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB,
and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board
TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104
Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors
used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset.
After the trained model was converted and compiled for target-specific hardware
formats to improve the execution efficiency. Conclusions and Results: The
platforms were assessed in terms of performance of the evaluation metrics and
efficiency (time of inference). Graphical Processing Units (GPUs) were the
slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays
(FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency
of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson
TX2. TPU and GPU are the most power-efficient, consuming about 5W. The
performance differences, in the evaluation metrics, across devices are
irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of
about 60 %.
Related papers
- Fast Object Detection with a Machine Learning Edge Device [0.0]
This machine learning study investigates a lowcost edge device integrated with an embedded system having computer vision.
A primary aim of this study focused on reducing inferencing time and low-power consumption.
Much information is contributed to the final selection of Google's Coral brand, Edge TPU device.
arXiv Detail & Related papers (2024-10-05T14:37:58Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Green AI: A Preliminary Empirical Study on Energy Consumption in DL
Models Across Different Runtime Infrastructures [56.200335252600354]
It is common practice to deploy pre-trained models on environments distinct from their native development settings.
This led to the introduction of interchange formats such as ONNX, which includes its infrastructure, and ONNX, which work as standard formats.
arXiv Detail & Related papers (2024-02-21T09:18:44Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework.
We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects.
Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z) - Benchmarking GPU and TPU Performance with Graph Neural Networks [0.0]
This work analyzes and compares the GPU and TPU performance training a Graph Neural Network (GNN) developed to solve a real-life pattern recognition problem.
Characterizing the new class of models acting on sparse data may prove helpful in optimizing the design of deep learning libraries and future AI accelerators.
arXiv Detail & Related papers (2022-10-21T21:03:40Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - Evaluation of Thermal Imaging on Embedded GPU Platforms for Application
in Vehicular Assistance Systems [0.5156484100374058]
This study is focused on evaluating the real-time performance of thermal object detection for smart and safe vehicular systems.
A novel large-scale thermal dataset comprising of > 35,000 distinct frames is acquired.
The effectiveness of trained networks is validated on extensive test data using various quantitative metrics.
arXiv Detail & Related papers (2022-01-05T15:36:25Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and
CUDA Devices [0.36748639131154304]
FPGAs were the only processing hardware capable of high-performance computing for a long time.
Recent availability of embedded GPU-based systems allows for massively parallel embedded computing on graphics hardware.
We propose an approach for real-time embedded stereo processing on ARM and DJI-enabled devices.
arXiv Detail & Related papers (2021-06-15T07:29:25Z) - A Simple Model for Portable and Fast Prediction of Execution Time and
Power Consumption of GPU Kernels [2.9853894456071077]
This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU and SHOC.
Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of 8.86-52.00% and 1.84-2.94%, for time respectively power prediction across five different GPUs, while latency for a single prediction varies between 15 and 108 milliseconds.
arXiv Detail & Related papers (2020-01-20T13:40:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.