BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants
- URL: http://arxiv.org/abs/2602.18585v1
- Date: Fri, 20 Feb 2026 19:47:45 GMT
- Title: BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants
- Authors: Safwat Nusrat, Prithwiraj Bhattacharjee,
- Abstract summary: This paper benchmarks several YOLO architectures such as YOLOv5s, YOLOv8n/s/m, and YOLOv12n for object detection under two annotation regimes.<n>The FloralSix dataset, comprising 2,816 high-resolution photos of six different flower species, is also introduced.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise localization and recognition of flowers are crucial for advancing automated agriculture, particularly in plant phenotyping, crop estimation, and yield monitoring. This paper benchmarks several YOLO architectures such as YOLOv5s, YOLOv8n/s/m, and YOLOv12n for flower object detection under two annotation regimes: single-image single-bounding box (SISBB) and single-image multiple-bounding box (SIMBB). The FloralSix dataset, comprising 2,816 high-resolution photos of six different flower species, is also introduced. It is annotated for both dense (clustered) and sparse (isolated) scenarios. The models were evaluated using Precision, Recall, and Mean Average Precision (mAP) at IoU thresholds of 0.5 (mAP@0.5) and 0.5-0.95 (mAP@0.5:0.95). In SISBB, YOLOv8m (SGD) achieved the best results with Precision 0.956, Recall 0.951, mAP@0.5 0.978, and mAP@0.5:0.95 0.865, illustrating strong accuracy in detecting isolated flowers. With mAP@0.5 0.934 and mAP@0.5:0.95 0.752, YOLOv12n (SGD) outperformed the more complicated SIMBB scenario, proving robustness in dense, multi-object detection. Results show how annotation density, IoU thresholds, and model size interact: recall-optimized models perform better in crowded environments, whereas precision-oriented models perform best in sparse scenarios. In both cases, the Stochastic Gradient Descent (SGD) optimizer consistently performed better than alternatives. These density-sensitive sensors are helpful for non-destructive crop analysis, growth tracking, robotic pollination, and stress evaluation.
Related papers
- Detection of On-Ground Chestnuts Using Artificial Intelligence Toward Automated Picking [0.09176056742068812]
Traditional mechanized chestnut harvesting is too costly for small producers.<n> Accurate, reliable detection of chestnuts on the orchard floor is crucial for developing low-cost, vision-guided automated harvesting technology.<n>This study collected 319 images of chestnuts on the orchard floor, containing 6524 annotated chestnuts.
arXiv Detail & Related papers (2026-02-15T13:28:23Z) - YOLO-DS: Fine-Grained Feature Decoupling via Dual-Statistic Synergy Operator for Object Detection [55.58092342624062]
We propose YOLO-DS, a framework built around a novel Dual-Statistic Synergy Operator (DSO)<n>YOLO-DS decouples object features by jointly modeling the channel-wise mean and the peak-to-mean difference.<n>On the MS-COCO benchmark, YOLO-DS consistently outperforms YOLOv8 across five model scales.
arXiv Detail & Related papers (2026-01-26T05:50:32Z) - YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception [58.06752127687312]
We propose YOLOv13, an accurate and lightweight object detector.<n>We propose a Hypergraph-based Adaptive Correlation Enhancement (HyperACE) mechanism.<n>We also propose a Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm.
arXiv Detail & Related papers (2025-06-21T15:15:03Z) - RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity [0.8488322025656239]
This study conducts a detailed comparison of RF-DETR object detection base model and YOLOv12 object detection model configurations.<n>A custom dataset was developed featuring both single-class (greenfruit) and multi-class (occluded and non-occluded greenfruits) annotations.<n>RF-DETR model, utilizing a DINOv2 backbone and deformable attention, excelled in global context modeling.<n>YOLOv12 leveraged CNN-based attention for enhanced local feature extraction, optimizing it for computational efficiency and edge deployment.
arXiv Detail & Related papers (2025-04-17T17:08:11Z) - Assessing the Capability of YOLO- and Transformer-based Object Detectors for Real-time Weed Detection [0.0]
All available models of YOLOv8, YOLOv9, YOLOv10, and RT-DETR are trained and evaluated with images from a real field situation.<n>The results demonstrate that while all models perform equally well in the metrics evaluated, the YOLOv9 models stand out in terms of their strong recall scores.<n> RT-DETR models, especially RT-DETR-l, excel in precision with reaching 82.44 % on dataset 1 and 81.46 % in dataset 2.
arXiv Detail & Related papers (2025-01-29T02:39:57Z) - Robust Fine-tuning of Zero-shot Models via Variance Reduction [56.360865951192324]
When fine-tuning zero-shot models, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD)
We propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs.
arXiv Detail & Related papers (2024-11-11T13:13:39Z) - Fall Detection for Industrial Setups Using YOLOv8 Variants [0.0]
The YOLOv8m model, consisting of 25.9 million parameters and 79.1 GFLOPs, demonstrated a respectable balance between computational efficiency and detection performance.
Although the YOLOv8l and YOLOv8x models presented higher precision and recall, their higher computational demands and model size make them less suitable for resource-constrained environments.
arXiv Detail & Related papers (2024-08-08T17:24:54Z) - Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments [2.925778409623925]
This study compares the one stage YOLOv8 model with the two stage Mask R CNN model for instance segmentation.<n>Results showed YOLOv8 outperformed Mask R CNN with higher precision and near perfect recall at a confidence threshold of 0.5.
arXiv Detail & Related papers (2023-12-13T07:29:24Z) - Two Scalable Approaches for Burned-Area Mapping Using U-Net and Landsat
Imagery [39.91303506884272]
This study explores two proposed approaches based on the U-Net model for automating and optimizing the burned-area mapping process.
Tests based on 195 representative images of the study area show that increasing dataset balance using the AS model yields better performance.
arXiv Detail & Related papers (2023-11-29T05:42:25Z) - Hessian-Aware Pruning and Optimal Neural Implant [74.3282611517773]
Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models.
We introduce a new Hessian Aware Pruning method coupled with a Neural Implant approach that uses second-order sensitivity as a metric for structured pruning.
arXiv Detail & Related papers (2021-01-22T04:08:03Z) - A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows
from UAV Imagery [56.10033255997329]
We propose a novel deep learning method based on a Convolutional Neural Network (CNN)
It simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations.
The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.
arXiv Detail & Related papers (2020-12-31T18:51:17Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.