A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards
- URL: http://arxiv.org/abs/2510.09948v1
- Date: Sat, 11 Oct 2025 01:30:48 GMT
- Title: A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards
- Authors: Pan Wang, Yihao Hu, Xiaodong Bai, Aiping Yang, Xiangxiang Li, Meiping Ding, Jianguo Yao,
- Abstract summary: This study identifies four key challenges that affect the accuracy of Shatian pomelo detection.<n>To mitigate these challenges, a multi-strategy framework is proposed in this paper.<n>Our proposed network demonstrates superior performance compared to other state-of-the-art detection methods.
- Score: 5.779478641472218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a specialty agricultural product with a large market scale, Shatian pomelo necessitates the adoption of automated detection to ensure accurate quantity and meet commercial demands for lean production. Existing research often involves specialized networks tailored for specific theoretical or dataset scenarios, but these methods tend to degrade performance in real-world. Through analysis of factors in this issue, this study identifies four key challenges that affect the accuracy of Shatian pomelo detection: imaging devices, lighting conditions, object scale variation, and occlusion. To mitigate these challenges, a multi-strategy framework is proposed in this paper. Firstly, to effectively solve tone variation introduced by diverse imaging devices and complex orchard environments, we utilize a multi-scenario dataset, STP-AgriData, which is constructed by integrating real orchard images with internet-sourced data. Secondly, to simulate the inconsistent illumination conditions, specific data augmentations such as adjusting contrast and changing brightness, are applied to the above dataset. Thirdly, to address the issues of object scale variation and occlusion in fruit detection, an REAS-Det network is designed in this paper. For scale variation, RFAConv and C3RFEM modules are designed to expand and enhance the receptive fields. For occlusion variation, a multi-scale, multi-head feature selection structure (MultiSEAM) and soft-NMS are introduced to enhance the handling of occlusion issues to improve detection accuracy. The results of these experiments achieved a precision(P) of 87.6%, a recall (R) of 74.9%, a mAP@.50 of 82.8%, and a mAP@.50:.95 of 53.3%. Our proposed network demonstrates superior performance compared to other state-of-the-art detection methods.
Related papers
- Detecting Deepfakes with Multivariate Soft Blending and CLIP-based Image-Text Alignment [4.34685509565816]
The proliferation of highly realistic facial forgeries necessitates robust detection methods.<n>Existing approaches often suffer from limited accuracy and poor generalization due to significant distribution shifts among samples generated by diverse forgery techniques.<n>Our method leverages the multimodal alignment capabilities of CLIP to capture subtle forgery traces.
arXiv Detail & Related papers (2026-02-14T09:53:35Z) - TCLeaf-Net: a transformer-convolution framework with global-local attention for robust in-field lesion-level plant leaf disease detection [13.963787476506292]
We release Daylily-Leaf, a paired lesion-level dataset comprising 1,746 RGB images and 7,839 lesions captured under both ideal and in-field conditions.<n>We propose TCLeaf-Net, a transformer-convolution hybrid detector optimized for real-field use.
arXiv Detail & Related papers (2025-12-13T15:03:48Z) - A Comparative Benchmark of Real-time Detectors for Blueberry Detection towards Precision Orchard Management [2.667064587590596]
This study presents a novel comparative benchmark analysis of advanced real-time object detectors.<n>This dataset comprises 661 canopy images collected with smartphones during the 2022-2023 seasons.<n>Among the YOLO models, YOLOv12m achieved the best accuracy with a mAP@50 of 93.3%.
arXiv Detail & Related papers (2025-09-24T21:42:24Z) - Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection [108.5042835056188]
This work introduces Agent4FaceForgery to address two fundamental problems.<n>How to capture the diverse intents and iterative processes of human forgery creation.<n>How to model the complex, often adversarial, text-image interactions that accompany forgeries in social media.
arXiv Detail & Related papers (2025-09-16T01:05:01Z) - MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning [25.381211868583826]
We propose a multi-modal self-supervised learning framework that leverages high-resolution RGB images, multi-spectral data, and digital surface models (DSM) for pre-training.<n>We evaluate the proposed method on multiple downstream tasks, covering typical remote sensing applications such as scene classification, semantic segmentation, change detection, object detection, and depth estimation.
arXiv Detail & Related papers (2025-06-11T02:01:36Z) - Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data [22.68237431620023]
We develop a novel framework for category 6D estimation that relies purely on RGB input.<n>We demonstrate the effectiveness of our framework on a challenging benchmark evaluation of various shapes, sizes, and ripeness status.
arXiv Detail & Related papers (2025-05-30T14:25:52Z) - RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration [2.879328762187361]
We present RAAD-LLM, a novel framework for adaptive anomaly detection.<n>By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data.<n>Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset.
arXiv Detail & Related papers (2025-03-04T17:20:43Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results [73.98594459933008]
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems.
This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets.
We introduce the Wild Face Anti-Spoofing dataset, a large-scale, diverse FAS dataset collected in unconstrained settings.
arXiv Detail & Related papers (2023-04-12T10:29:42Z) - AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.