GDA-YOLO11: Amodal Instance Segmentation for Occlusion-Robust Robotic Fruit Harvesting
- URL: http://arxiv.org/abs/2602.23953v1
- Date: Fri, 27 Feb 2026 11:56:33 GMT
- Title: GDA-YOLO11: Amodal Instance Segmentation for Occlusion-Robust Robotic Fruit Harvesting
- Authors: Caner Beldek, Emre Sariyildiz, Son Lam Phung, Gursel Alici,
- Abstract summary: We propose a harvesting framework using a new amodal segmentation model, GDA-YOLO11, which incorporates architectural improvements and an updated asymmetric mask loss.<n>The framework attains harvesting success rates of 92.59%, 85.18%, 48.14%, and 22.22% at zero to high occlusion levels.
- Score: 3.088218970976369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occlusion remains a critical challenge in robotic fruit harvesting, as undetected or inaccurately localised fruits often results in substantial crop losses. To mitigate this issue, we propose a harvesting framework using a new amodal segmentation model, GDA-YOLO11, which incorporates architectural improvements and an updated asymmetric mask loss. The proposed model is trained on a modified version of a public citrus dataset and evaluated on both the base dataset and occlusion-sensitive subsets with varying occlusion levels. Within the framework, full fruit masks, including invisible regions, are inferred by GDA-YOLO11, and picking points are subsequently estimated using the Euclidean distance transform. These points are then projected into 3D coordinates for robotic harvesting execution. Experiments were conducted using real citrus fruits in a controlled environment simulating occlusion scenarios. Notably, to the best of our knowledge, this study provides the first practical demonstration of amodal instance segmentation in robotic fruit harvesting. GDA-YOLO11 achieves a precision of 0.844, recall of 0.846, mAP@50 of 0.914, and mAP@50:95 of 0.636, outperforming YOLO11n by 5.1%, 1.3%, and 1.0% in precision, mAP@50, and mAP@50:95, respectively. The framework attains harvesting success rates of 92.59%, 85.18%, 48.14%, and 22.22% at zero to high occlusion levels, improving success by 3.5% under medium and high occlusion. These findings demonstrate that GDA-YOLO11 enhances occlusion robust segmentation and streamlines perception-to-action integration, paving the way for more reliable autonomous systems in agriculture.
Related papers
- Detection of On-Ground Chestnuts Using Artificial Intelligence Toward Automated Picking [0.09176056742068812]
Traditional mechanized chestnut harvesting is too costly for small producers.<n> Accurate, reliable detection of chestnuts on the orchard floor is crucial for developing low-cost, vision-guided automated harvesting technology.<n>This study collected 319 images of chestnuts on the orchard floor, containing 6524 annotated chestnuts.
arXiv Detail & Related papers (2026-02-15T13:28:23Z) - A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z) - DAONet-YOLOv8: An Occlusion-Aware Dual-Attention Network for Tea Leaf Pest and Disease Detection [2.661320179262946]
We propose an enhanced YOLOv8 variant with three key improvements to accurately detect tea leaf pests and diseases.<n>ExistingNet-YOLOv8 achieves 92.97% precision, 92.80% recall, 97.10% mAP@50 and 76.90% mAP@50:95, outperforming the YOLOv8n baseline by 2.34, 4.68, 1.40 and 1.80 percentage points respectively.
arXiv Detail & Related papers (2025-11-28T14:28:30Z) - DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z) - YOLOv11-Litchi: Efficient Litchi Fruit Detection based on UAV-Captured Agricultural Imagery in Complex Orchard Environments [6.862722449907841]
This paper introduces YOLOv11-Litchi, a lightweight and robust detection model specifically designed for UAV-based litchi detection.<n>YOLOv11-Litchi achieves a parameter size of 6.35 MB - 32.5% smaller than the YOLOv11 baseline.<n>The model achieves a frame rate of 57.2 FPS, meeting real-time detection requirements.
arXiv Detail & Related papers (2025-10-11T09:44:00Z) - Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z) - AppleGrowthVision: A large-scale stereo dataset for phenological analysis, fruit detection, and 3D reconstruction in apple orchards [3.9494466926597487]
We present AppleGrowthVision, a large-scale dataset comprising two subsets.<n>The first includes 9,317 high resolution stereo images collected from a farm in Brandenburg (Germany), covering six agriculturally validated growth stages over a full growth cycle.<n>The second subset consists of 1,125 densely annotated images from the same farm in Brandenburg and one in Pillnitz (Germany), containing a total of 31,084 apple labels.<n>AppleGrowthVision provides stereo-image data with agriculturally validated growth stages, enabling precise phenological analysis and 3D reconstructions.
arXiv Detail & Related papers (2025-05-20T07:29:22Z) - CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE)
CRTRE applies randomize trial design principles to estimate the causal effect of association rules.
We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z) - Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - Tightening Classification Boundaries in Open Set Domain Adaptation
through Unknown Exploitation [45.74830585715129]
Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas.
But when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance.
We propose a novel way to improve OSDA approaches by extracting a high-confidence set of unknown instances.
arXiv Detail & Related papers (2023-09-16T11:33:40Z) - Automating global landslide detection with heterogeneous ensemble
deep-learning classification [44.99833362998488]
Landslides threaten infrastructure, including roads, railways, buildings, and human life.
Hazard-based spatial planning and early warning systems are cost-effective strategies to reduce the risk to society from landslides.
Deep learning models have recently been applied for landside mapping using medium- to high-resolution satellite images as input.
arXiv Detail & Related papers (2023-09-12T10:56:16Z) - Patch-Level Contrasting without Patch Correspondence for Accurate and
Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation.
Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z) - A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows
from UAV Imagery [56.10033255997329]
We propose a novel deep learning method based on a Convolutional Neural Network (CNN)
It simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations.
The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.
arXiv Detail & Related papers (2020-12-31T18:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.