Evaluating the Single-Shot MultiBox Detector and YOLO Deep Learning
Models for the Detection of Tomatoes in a Greenhouse
- URL: http://arxiv.org/abs/2109.00810v1
- Date: Thu, 2 Sep 2021 09:39:12 GMT
- Title: Evaluating the Single-Shot MultiBox Detector and YOLO Deep Learning
Models for the Detection of Tomatoes in a Greenhouse
- Authors: Sandro A. Magalh\~aes, Lu\'is Castro, Germano Moreira, Filipe N.
Santos, m\'ario Cunha, Jorge Dias and Ant\'onio P. Moreira
- Abstract summary: This paper contributes with an annotated visual dataset of green and reddish tomatoes.
Considering our robotic platform specifications, only the Single-Shot MultiBox Detector (SSD) and YOLO architectures were considered.
The results proved that the system can detect green and reddish tomatoes, even those occluded by leaves.
- Score: 2.949270275392492
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of robotic solutions for agriculture requires advanced
perception capabilities that can work reliably in any crop stage. For example,
to automatise the tomato harvesting process in greenhouses, the visual
perception system needs to detect the tomato in any life cycle stage (flower to
the ripe tomato). The state-of-the-art for visual tomato detection focuses
mainly on ripe tomato, which has a distinctive colour from the background. This
paper contributes with an annotated visual dataset of green and reddish
tomatoes. This kind of dataset is uncommon and not available for research
purposes. This will enable further developments in edge artificial intelligence
for in situ and in real-time visual tomato detection required for the
development of harvesting robots. Considering this dataset, five deep learning
models were selected, trained and benchmarked to detect green and reddish
tomatoes grown in greenhouses. Considering our robotic platform specifications,
only the Single-Shot MultiBox Detector (SSD) and YOLO architectures were
considered. The results proved that the system can detect green and reddish
tomatoes, even those occluded by leaves. SSD MobileNet v2 had the best
performance when compared against SSD Inception v2, SSD ResNet 50, SSD ResNet
101 and YOLOv4 Tiny, reaching an F1-score of 66.15%, an mAP of 51.46% and an
inference time of 16.44 ms with the NVIDIA Turing Architecture platform, an
NVIDIA Tesla T4, with 12 GB. YOLOv4 Tiny also had impressive results, mainly
concerning inferring times of about 5 ms.
Related papers
- Deep learning-based approach for tomato classification in complex scenes [0.8287206589886881]
We have proposed a tomato ripening monitoring approach based on deep learning in complex scenes.
The objective is to detect mature tomatoes and harvest them in a timely manner.
Experiments are based on images collected from the internet gathered through searches using tomato state across diverse languages.
arXiv Detail & Related papers (2024-01-26T18:33:57Z) - Tomato Maturity Recognition with Convolutional Transformers [5.220581005698766]
Authors propose a novel method for tomato maturity classification using a convolutional transformer.
New tomato dataset named KUTomaData is designed to train deep-learning models for tomato segmentation and classification.
Authors show that the convolutional transformer outperforms state-of-the-art methods for tomato maturity classification.
arXiv Detail & Related papers (2023-07-04T07:33:53Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - Ultra-low Power Deep Learning-based Monocular Relative Localization
Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones.
To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations.
Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z) - Detection of Tomato Ripening Stages using Yolov3-tiny [0.0]
We use a neural network-based model for tomato classification and detection.
Our experiments showed an f1-score of 90.0% in the localization and classification of ripening stages in a custom dataset.
arXiv Detail & Related papers (2023-02-01T00:57:58Z) - ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [55.485985317538194]
ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
arXiv Detail & Related papers (2022-06-14T17:09:35Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Deep-CNN based Robotic Multi-Class Under-Canopy Weed Control in
Precision Farming [2.6085535710135654]
Real-time multi-class weed identification enables species-specific treatment of weeds and significantly reduces the amount of herbicide use.
Here, we present a baseline for classification performance using five benchmark CNN models.
We deploy MobileNetV2 onto our own compact autonomous robot textitSAMBot for real-time weed detection.
arXiv Detail & Related papers (2021-12-28T03:51:55Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Geometry-Based Grasping of Vine Tomatoes [6.547498821163685]
We propose a geometry-based grasping method for vine tomatoes.
It relies on a computer-vision pipeline to identify the required geometric features of the tomatoes and of the truss stem.
The grasping method then uses a geometric model of the robotic hand and the truss to determine a suitable grasping location on the stem.
arXiv Detail & Related papers (2021-03-01T19:33:51Z) - Towards Palmprint Verification On Smartphones [62.279124220123286]
Studies in the past two decades have shown that palmprints have outstanding merits in uniqueness and permanence.
We built a DCNN-based palmprint verification system named DeepMPV+ for smartphones.
The efficiency and efficacy of DeepMPV+ have been corroborated by extensive experiments.
arXiv Detail & Related papers (2020-03-30T08:31:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.