TCLeaf-Net: a transformer-convolution framework with global-local attention for robust in-field lesion-level plant leaf disease detection
- URL: http://arxiv.org/abs/2512.12357v1
- Date: Sat, 13 Dec 2025 15:03:48 GMT
- Title: TCLeaf-Net: a transformer-convolution framework with global-local attention for robust in-field lesion-level plant leaf disease detection
- Authors: Zishen Song, Yongjian Zhu, Dong Wang, Hongzhan Liu, Lingyu Jiang, Yongxing Duan, Zehua Zhang, Sihan Li, Jiarui Li,
- Abstract summary: We release Daylily-Leaf, a paired lesion-level dataset comprising 1,746 RGB images and 7,839 lesions captured under both ideal and in-field conditions.<n>We propose TCLeaf-Net, a transformer-convolution hybrid detector optimized for real-field use.
- Score: 13.963787476506292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Timely and accurate detection of foliar diseases is vital for safeguarding crop growth and reducing yield losses. Yet, in real-field conditions, cluttered backgrounds, domain shifts, and limited lesion-level datasets hinder robust modeling. To address these challenges, we release Daylily-Leaf, a paired lesion-level dataset comprising 1,746 RGB images and 7,839 lesions captured under both ideal and in-field conditions, and propose TCLeaf-Net, a transformer-convolution hybrid detector optimized for real-field use. TCLeaf-Net is designed to tackle three major challenges. To mitigate interference from complex backgrounds, the transformer-convolution module (TCM) couples global context with locality-preserving convolution to suppress non-leaf regions. To reduce information loss during downsampling, the raw-scale feature recalling and sampling (RSFRS) block combines bilinear resampling and convolution to preserve fine spatial detail. To handle variations in lesion scale and feature shifts, the deformable alignment block with FPN (DFPN) employs offset-based alignment and multi-receptive-field perception to strengthen multi-scale fusion. Experimental results show that on the in-field split of the Daylily-Leaf dataset, TCLeaf-Net improves mAP@50 by 5.4 percentage points over the baseline model, reaching 78.2\%, while reducing computation by 7.5 GFLOPs and GPU memory usage by 8.7\%. Moreover, the model outperforms recent YOLO and RT-DETR series in both precision and recall, and demonstrates strong performance on the PlantDoc, Tomato-Leaf, and Rice-Leaf datasets, validating its robustness and generalizability to other plant disease detection scenarios.
Related papers
- Dynamic Meta-Ensemble Framework for Efficient and Accurate Deep Learning in Plant Leaf Disease Detection on Resource-Constrained Edge Devices [0.0]
We introduce a novel Dynamic Meta-Enemble Framework (DMEF) for high-accuracy plant disease diagnosis under resource constraints.<n>DMEF employs an adaptive weighting mechanism that dynamically combines the predictions of three lightweight convolutional neural networks.<n>Experiments on benchmark datasets for potato and maize diseases demonstrate state-of-the-art classification accuracies of 99.53% and 96.61%, respectively.
arXiv Detail & Related papers (2026-01-24T03:57:49Z) - A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z) - DAONet-YOLOv8: An Occlusion-Aware Dual-Attention Network for Tea Leaf Pest and Disease Detection [2.661320179262946]
We propose an enhanced YOLOv8 variant with three key improvements to accurately detect tea leaf pests and diseases.<n>ExistingNet-YOLOv8 achieves 92.97% precision, 92.80% recall, 97.10% mAP@50 and 76.90% mAP@50:95, outperforming the YOLOv8n baseline by 2.34, 4.68, 1.40 and 1.80 percentage points respectively.
arXiv Detail & Related papers (2025-11-28T14:28:30Z) - A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards [5.779478641472218]
This study identifies four key challenges that affect the accuracy of Shatian pomelo detection.<n>To mitigate these challenges, a multi-strategy framework is proposed in this paper.<n>Our proposed network demonstrates superior performance compared to other state-of-the-art detection methods.
arXiv Detail & Related papers (2025-10-11T01:30:48Z) - YOLO11-CR: a Lightweight Convolution-and-Attention Framework for Accurate Fatigue Driving Detection [0.0]
This paper introduces YOLO11-CR, a lightweight and efficient object detection model tailored for real-time fatigue monitoring.<n>YOLO11-CR introduces two key modules: the Convolution-and-Attention Fusion Module (CAFM) and the Rectangular Module (RCM)<n>Experiments on the DSM dataset demonstrated that YOLO11-CR achieves a precision of 87.17%, recall of 83.86%, mAP@50 of 88.09%, and mAP@50-95 of 55.93%.
arXiv Detail & Related papers (2025-08-16T07:19:04Z) - Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations [1.331812695405053]
We present a novel framework for enhancing model interpretability by integrating heatmaps produced by ResNet and a restructured 2D Transformer with globally weighted input saliency.<n>Our method merges gradient-weighted activation maps (ResNet) and Transformer attention rollout into a unified visualization, achieving full spatial-temporal alignment.<n> Empirical evaluations on clinical (ECG arrhythmia detection) and industrial datasets demonstrate significant improvements.
arXiv Detail & Related papers (2025-06-30T20:04:35Z) - ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z) - Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification [3.344876133162209]
We introduce a novel Decentralized Federated Learning (DFL) framework that uses validation loss (Loss_val) to guide model sharing between peers and to correct local training via an adaptive loss function controlled by weighting parameter.<n>Results demonstrate that our DFL approach not only improves accuracy and convergence speed, but also ensures better generalization and robustness across heterogeneous data environments.
arXiv Detail & Related papers (2025-05-29T04:12:53Z) - Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models.
We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z) - Transformer Multivariate Forecasting: Less is More? [42.558736426375056]
The paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency.
The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets.
From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average.
arXiv Detail & Related papers (2023-12-30T13:44:23Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - TinyAD: Memory-efficient anomaly detection for time series data in
Industrial IoT [43.207210990362825]
We propose a novel framework named Tiny Anomaly Detection (TinyAD) to efficiently facilitate onboard inference of CNNs for real-time anomaly detection.
To reduce the peak memory consumption of CNNs, we explore two complementary strategies, in-place, and patch-by-patch memory rescheduling.
Our framework can reduce peak memory consumption by 2-5x with negligible overhead.
arXiv Detail & Related papers (2023-03-07T02:56:15Z) - The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks.
We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.