Related papers: A Comparative Benchmark of Real-time Detectors for Blueberry Detection towards Precision Orchard Management

A Comparative Benchmark of Real-time Detectors for Blueberry Detection towards Precision Orchard Management

URL: http://arxiv.org/abs/2509.20580v2
Date: Sat, 04 Oct 2025 18:40:50 GMT
Title: A Comparative Benchmark of Real-time Detectors for Blueberry Detection towards Precision Orchard Management
Authors: Xinyang Mu, Yuzhen Lu, Boyang Deng,
Abstract summary: This study presents a novel comparative benchmark analysis of advanced real-time object detectors.<n>This dataset comprises 661 canopy images collected with smartphones during the 2022-2023 seasons.<n>Among the YOLO models, YOLOv12m achieved the best accuracy with a mAP@50 of 93.3%.
Score: 2.667064587590596
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Blueberry detection in natural environments remains challenging due to variable lighting, occlusions, and motion blur due to environmental factors and imaging devices. Deep learning-based object detectors promise to address these challenges, but they demand a large-scale, diverse dataset that captures the real-world complexities. Moreover, deploying these models in practical scenarios often requires the right accuracy/speed/memory trade-off in model selection. This study presents a novel comparative benchmark analysis of advanced real-time object detectors, including YOLO (You Only Look Once) (v8-v12) and RT-DETR (Real-Time Detection Transformers) (v1-v2) families, consisting of 36 model variants, evaluated on a newly curated dataset for blueberry detection. This dataset comprises 661 canopy images collected with smartphones during the 2022-2023 seasons, consisting of 85,879 labelled instances (including 36,256 ripe and 49,623 unripe blueberries) across a wide range of lighting conditions, occlusions, and fruit maturity stages. Among the YOLO models, YOLOv12m achieved the best accuracy with a mAP@50 of 93.3%, while RT-DETRv2-X obtained a mAP@50 of 93.6%, the highest among all the RT-DETR variants. The inference time varied with the model scale and complexity, and the mid-sized models appeared to offer a good accuracy-speed balance. To further enhance detection performance, all the models were fine-tuned using Unbiased Mean Teacher-based semi-supervised learning (SSL) on a separate set of 1,035 unlabeled images acquired by a ground-based machine vision platform in 2024. This resulted in accuracy gains ranging from -1.4% to 2.9%, with RT-DETR-v2-X achieving the best mAP@50 of 94.8%. More in-depth research into SSL is needed to better leverage cross-domain unlabeled data. Both the dataset and software programs of this study are made publicly available to support further research.

Related papers

Detection of On-Ground Chestnuts Using Artificial Intelligence Toward Automated Picking [0.09176056742068812]
Traditional mechanized chestnut harvesting is too costly for small producers.<n> Accurate, reliable detection of chestnuts on the orchard floor is crucial for developing low-cost, vision-guided automated harvesting technology.<n>This study collected 319 images of chestnuts on the orchard floor, containing 6524 annotated chestnuts.
arXiv Detail & Related papers (2026-02-15T13:28:23Z)
DeepRed: an architecture for redshift estimation [42.231769414215435]
We show how a deep learning pipeline can estimate redshifts from images of galaxies, gravitational lenses, and supernovae.<n>Our approach achieves state-of-the-art results on all datasets.<n>These findings suggest that deep learning is a scalable, robust, and interpretable solution for redshift estimation in large-scale surveys.
arXiv Detail & Related papers (2026-02-11T19:00:10Z)
A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z)
FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting [41.82363110982653]
We present FireSentry, a provincial-scale multi-modal wildfire dataset characterized by sub-meter spatial and sub-second temporal resolution.<n>FireSentry provides visible and infrared video streams, in-situ environmental measurements, and manually validated fire masks.<n>Building on FireSentry, we establish a comprehensive benchmark encompassing physics-based, data-driven, and generative models.
arXiv Detail & Related papers (2025-12-03T02:02:47Z)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios [39.58602686069029]
We introduce OmniGround, a comprehensive benchmark with 3,475 videos spanning 81 categories and complex real-world queries.<n>We also introduce DeepSTG, a systematic evaluation framework quantifying dataset quality across four complementary dimensions.<n>Experiments demonstrate PG-TAF achieves 25.6% and 35.6% improvements in m_tIoU and m_vIoU with consistent gains across four benchmarks.
arXiv Detail & Related papers (2025-11-21T04:23:04Z)
Maize Seedling Detection Dataset (MSDD): A Curated High-Resolution RGB Dataset for Seedling Maize Detection and Benchmarking with YOLOv9, YOLO11, YOLOv12 and Faster-RCNN [0.28647133890966986]
Stand counting determines how many plants germinated, guiding timely decisions such as replanting or adjusting inputs.<n>We introduce MSDD, a high-quality aerial image dataset for maize seedling stand counting, with applications in early-season crop monitoring, yield prediction, and in-field management.<n> MSDD contains three classes-single, double, and triple plants-capturing diverse growth stages, planting setups, soil types, lighting conditions, camera angles, and densities, ensuring robustness for real-world use.
arXiv Detail & Related papers (2025-09-18T17:41:59Z)
YOLO for Knowledge Extraction from Vehicle Images: A Baseline Study [0.20482269513546458]
This study evaluates the effectiveness of three state-of-the-art deep learning approaches YOLO-v11, YOLO-World, and YOLO- Classification.<n>This dataset was collected under challenging and unconstrained conditions by NSW Police Highway Patrol Vehicles.<n>It was concluded that there is a need to use MVI to get usable models within such complex real-world datasets.
arXiv Detail & Related papers (2025-07-25T05:31:21Z)
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation [55.13521733366838]
The 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks.<n>In 2024, we introduced new model-free tasks, where no 3D object models are available and methods need to onboard objects just from provided reference videos.<n>We defined a new, more practical 6D object detection task where identities of objects visible in a test image are not provided as input.
arXiv Detail & Related papers (2025-04-03T17:55:19Z)
Assessing the Capability of YOLO- and Transformer-based Object Detectors for Real-time Weed Detection [0.0]
All available models of YOLOv8, YOLOv9, YOLOv10, and RT-DETR are trained and evaluated with images from a real field situation.<n>The results demonstrate that while all models perform equally well in the metrics evaluated, the YOLOv9 models stand out in terms of their strong recall scores.<n> RT-DETR models, especially RT-DETR-l, excel in precision with reaching 82.44 % on dataset 1 and 81.46 % in dataset 2.
arXiv Detail & Related papers (2025-01-29T02:39:57Z)
A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing [46.603157010223505]
We propose an adaptive fine-tuning algorithm for multimodal large models. We train the model on two 3090 GPU using one-third of the GeoChat multimodal remote sensing dataset. The model achieved scores of 89.86 and 77.19 on the UCMerced and AID evaluation datasets.
arXiv Detail & Related papers (2024-09-20T09:19:46Z)
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection. We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset. Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z)
DeepSeaNet: Improving Underwater Object Detection using EfficientDet [0.0]
This project involves implementing and evaluating various object detection models on an annotated underwater dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset.
arXiv Detail & Related papers (2023-05-26T13:41:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.