Integrating YOLO11 and Convolution Block Attention Module for Multi-Season Segmentation of Tree Trunks and Branches in Commercial Apple Orchards
- URL: http://arxiv.org/abs/2412.05728v1
- Date: Sat, 07 Dec 2024 19:36:22 GMT
- Title: Integrating YOLO11 and Convolution Block Attention Module for Multi-Season Segmentation of Tree Trunks and Branches in Commercial Apple Orchards
- Authors: Ranjan Sapkota, Manoj Karkee,
- Abstract summary: This study developed a customized instance segmentation model by integrating the Convolutional Block Attention Module with the YOLO11 architecture.<n>The model was individually validated across dormant and canopy season images after training the YOLO11-CBAM.<n>The modeling approach, trained on two season datasets as dormant and canopy season images, demonstrated the potential of the YOLO11-CBAM integration.
- Score: 0.4143603294943439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we developed a customized instance segmentation model by integrating the Convolutional Block Attention Module (CBAM) with the YOLO11 architecture. This model, trained on a mixed dataset of dormant and canopy season apple orchard images, aimed to enhance the segmentation of tree trunks and branches under varying seasonal conditions throughout the year. The model was individually validated across dormant and canopy season images after training the YOLO11-CBAM on the mixed dataset collected over the two seasons. Additional testing of the model during pre-bloom, flower bloom, fruit thinning, and harvest season was performed. The highest recall and precision metrics were observed in the YOLO11x-seg-CBAM and YOLO11m-seg-CBAM respectively. Particularly, YOLO11m-seg with CBAM showed the highest precision of 0.83 as performed for the Trunk class in training, while without the CBAM, YOLO11m-seg achieved 0.80 precision score for the Trunk class. Likewise, for branch class, YOLO11m-seg with CBAM achieved the highest precision score value of 0.75 while without the CBAM, the YOLO11m-seg achieved a precision of 0.73. For dormant season validation, YOLO11x-seg exhibited the highest precision at 0.91. Canopy season validation highlighted YOLO11s-seg with superior precision across all classes, achieving 0.516 for Branch, and 0.64 for Trunk. The modeling approach, trained on two season datasets as dormant and canopy season images, demonstrated the potential of the YOLO11-CBAM integration to effectively detect and segment tree trunks and branches year-round across all seasonal variations. Keywords: YOLOv11, YOLOv11 Tree Detection, YOLOv11 Branch Detection and Segmentation, Machine Vision, Deep Learning, Machine Learning
Related papers
- CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification [65.46685389276443]
We ground our work on CLIP, a vision-language pre-trained encoder model that can perform zero-shot classification by matching an image with text prompts.
We then formulate purification risk as the KL divergence between the joint distributions purification process.
We propose two variants for our CLIPure approach: CLI-Diff which models the likelihood of images' latent vectors, and CLIPure-Cos which models the likelihood with the cosine similarity between the embeddings of an image and a photo of a.''
arXiv Detail & Related papers (2025-02-25T13:09:34Z) - Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development [0.36868085124383626]
This study presents a novel method for deep learning-based instance segmentation of apples in commercial orchards.
We synthetically generated orchard images and automatically annotated them using the Segment Anything Model (SAM) integrated with a YOLO11 base model.
The results showed that the automatically generated annotations achieved a Dice Coefficient of 0.9513 and an IoU of 0.9303, validating the accuracy and overlap of the mask annotations.
arXiv Detail & Related papers (2024-11-18T05:11:29Z) - Robust Fine-tuning of Zero-shot Models via Variance Reduction [56.360865951192324]
When fine-tuning zero-shot models, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD)
We propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs.
arXiv Detail & Related papers (2024-11-11T13:13:39Z) - Comparing YOLO11 and YOLOv8 for instance segmentation of occluded and non-occluded immature green fruits in complex orchard environment [0.4143603294943439]
This study focused on YOLO11 and YOLOv8's instance segmentation capabilities for immature green apples in orchard environments.
YOLO11n-seg achieved the highest mask precision across all categories with a notable score of 0.831.
YOLO11m-seg consistently outperformed, registering the highest scores for both box and mask segmentation.
arXiv Detail & Related papers (2024-10-24T00:12:20Z) - YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning [0.4143603294943439]
Method for 3D pose estimation of immature green apples (fruitlets) in commercial orchards was developed.
YOLO11(or YOLOv11) object detection and pose estimation algorithm alongside Vision Transformers (ViT) for depth estimation.
YOLO11n surpassed all configurations of YOLO11 and YOLOv8 in terms of box precision and pose precision.
arXiv Detail & Related papers (2024-10-21T17:00:03Z) - Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments [0.9565934024763958]
This study systematically performed a real-world evaluation of the performances of YOLOv8, YOLOv9, YOLOv10, YOLO11( or YOLOv11), and YOLOv12 object detection algorithms.
YOLOv12l recorded the highest recall rate at 0.90, compared to all other configurations of YOLO models.
YOLOv11n achieved highest inference speed of 2.4 ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5 ms), and YOLOv12n
arXiv Detail & Related papers (2024-07-01T17:59:55Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments [0.36868085124383626]
This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation.
YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5.
arXiv Detail & Related papers (2023-12-13T07:29:24Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows
from UAV Imagery [56.10033255997329]
We propose a novel deep learning method based on a Convolutional Neural Network (CNN)
It simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations.
The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.
arXiv Detail & Related papers (2020-12-31T18:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.