YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning
- URL: http://arxiv.org/abs/2410.19846v1
- Date: Mon, 21 Oct 2024 17:00:03 GMT
- Title: YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning
- Authors: Ranjan Sapkota, Manoj Karkee,
- Abstract summary: Method for 3D pose estimation of immature green apples (fruitlets) in commercial orchards was developed.
YOLO11 object detection and pose estimation algorithm alongside Vision Transformers (ViT) for depth estimation.
YOLO11n surpassed all configurations of YOLO11 and YOLOv8 in terms of box precision and pose precision.
- Score: 0.4143603294943439
- License:
- Abstract: In this study, a robust method for 3D pose estimation of immature green apples (fruitlets) in commercial orchards was developed, utilizing the YOLO11 object detection and pose estimation algorithm alongside Vision Transformers (ViT) for depth estimation (Dense Prediction Transformer (DPT) and Depth Anything V2). For object detection and pose estimation, performance comparisons of YOLO11 (YOLO11n, YOLO11s, YOLO11m, YOLO11l and YOLO11x) and YOLOv8 (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l and YOLOv8x) were made under identical hyperparameter settings among the all configurations. It was observed that YOLO11n surpassed all configurations of YOLO11 and YOLOv8 in terms of box precision and pose precision, achieving scores of 0.91 and 0.915, respectively. Conversely, YOLOv8n exhibited the highest box and pose recall scores of 0.905 and 0.925, respectively. Regarding the mean average precision at 50\% intersection over union (mAP@50), YOLO11s led all configurations with a box mAP@50 score of 0.94, while YOLOv8n achieved the highest pose mAP@50 score of 0.96. In terms of image processing speed, YOLO11n outperformed all configurations with an impressive inference speed of 2.7 ms, significantly faster than the quickest YOLOv8 configuration, YOLOv8n, which processed images in 7.8 ms. Subsequent integration of ViTs for the green fruit's pose depth estimation revealed that Depth Anything V2 outperformed Dense Prediction Transformer in 3D pose length validation, achieving the lowest Root Mean Square Error (RMSE) of 1.52 and Mean Absolute Error (MAE) of 1.28, demonstrating exceptional precision in estimating immature green fruit lengths. Integration of YOLO11 and Depth Anything Model provides a promising solution to 3D pose estimation of immature green fruits for robotic thinning applications.
Related papers
- Evaluating the Evolution of YOLO (You Only Look Once) Models: A Comprehensive Benchmark Study of YOLO11 and Its Predecessors [0.0]
This study presents a benchmark analysis of various YOLO (You Only Look Once) algorithms, from YOLOv3 to the newest addition, YOLO11.
It evaluates their performance on three diverse datasets: Traffic Signs (with varying object sizes), African Wildlife (with diverse aspect ratios and at least one instance of the object per image), and Ships and Vessels (with small-sized objects of a single class)
arXiv Detail & Related papers (2024-10-31T20:45:00Z) - Comparing YOLO11 and YOLOv8 for instance segmentation of occluded and non-occluded immature green fruits in complex orchard environment [0.4143603294943439]
This study focused on YOLO11 and YOLOv8's instance segmentation capabilities for immature green apples in orchard environments.
YOLO11n-seg achieved the highest mask precision across all categories with a notable score of 0.831.
YOLO11m-seg consistently outperformed, registering the highest scores for both box and mask segmentation.
arXiv Detail & Related papers (2024-10-24T00:12:20Z) - Quantizing YOLOv7: A Comprehensive Study [0.0]
This paper studies the effectiveness of a variety of quantization schemes on the pre-trained weights of the state-of-the-art YOLOv7 model.
Results show that using 4-bit quantization coupled with the combination of different granularities results in 3.92x and 3.86x memory-saving for uniform and non-uniform quantization.
arXiv Detail & Related papers (2024-07-06T03:23:04Z) - Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments [0.9565934024763958]
This study extensively evaluated You Only Look Once (YOLO) object detection algorithms across all configurations (total 22) of YOLOv8, YOLOv9, YOLOv10, and YOLO11 for green fruit detection in commercial orchards.
The research also validated in-field fruitlet counting using an iPhone and machine vision sensors across four apple varieties: Scifresh, Scilate, Honeycrisp and Cosmic Crisp.
arXiv Detail & Related papers (2024-07-01T17:59:55Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time
Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS.
We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.
Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - Real-time Strawberry Detection Based on Improved YOLOv5s Architecture
for Robotic Harvesting in open-field environment [0.0]
This study proposed a YOLOv5-based custom object detection model to detect strawberries in an outdoor environment.
The highest mean average precision of 80.3% was achieved using the proposed architecture.
The model is fast enough for real time strawberry detection and localization for the robotic picking.
arXiv Detail & Related papers (2023-08-08T02:28:48Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z) - Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
Our detector is built on a newly designed YOLO model, called YOLOX.
On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.