Related papers: Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks

Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks

URL: http://arxiv.org/abs/2504.20419v1
Date: Tue, 29 Apr 2025 04:31:58 GMT
Title: Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Authors: Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, Dimitrios K. Nasiopoulos,
Abstract summary: This study investigates the effectiveness of combining multimodal Large Language Models (LLMs) with Convolutional Neural Networks (CNNs) for automated plant disease classification using leaf imagery.<n>We evaluate model performance across zero-shot, few-shot, and progressive fine-tuning scenarios.
Score: 0.5009853409756729
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automation in agriculture plays a vital role in addressing challenges related to crop monitoring and disease management, particularly through early detection systems. This study investigates the effectiveness of combining multimodal Large Language Models (LLMs), specifically GPT-4o, with Convolutional Neural Networks (CNNs) for automated plant disease classification using leaf imagery. Leveraging the PlantVillage dataset, we systematically evaluate model performance across zero-shot, few-shot, and progressive fine-tuning scenarios. A comparative analysis between GPT-4o and the widely used ResNet-50 model was conducted across three resolutions (100, 150, and 256 pixels) and two plant species (apple and corn). Results indicate that fine-tuned GPT-4o models achieved slightly better performance compared to the performance of ResNet-50, achieving up to 98.12% classification accuracy on apple leaf images, compared to 96.88% achieved by ResNet-50, with improved generalization and near-zero training loss. However, zero-shot performance of GPT-4o was significantly lower, underscoring the need for minimal training. Additional evaluations on cross-resolution and cross-plant generalization revealed the models' adaptability and limitations when applied to new domains. The findings highlight the promise of integrating multimodal LLMs into automated disease detection pipelines, enhancing the scalability and intelligence of precision agriculture systems while reducing the dependence on large, labeled datasets and high-resolution sensor infrastructure. Large Language Models, Vision Language Models, LLMs and CNNs, Disease Detection with Vision Language Models, VLMs

Related papers

FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation [35.46876389599076]
FundusGAN is a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis.<n>We show that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics.
arXiv Detail & Related papers (2025-03-22T18:08:07Z)
Design and Implementation of FourCropNet: A CNN-Based System for Efficient Multi-Crop Disease Detection and Management [3.4161054453684705]
This study proposes FourCropNet, a novel deep learning model designed to detect diseases in multiple crops. FourCropNet achieved the highest accuracy of 99.7% for Grape, 99.5% for Corn, and 95.3% for the combined dataset.
arXiv Detail & Related papers (2025-03-11T12:00:56Z)
Object Detection for Medical Image Analysis: Insights from the RT-DETR Model [40.593685087097995]
This paper focuses on the application of a novel detection framework based on the RT-DETR model for analyzing intricate image data.<n>The proposed RT-DETR model, built on a Transformer-based architecture, excels at processing high-dimensional and complex visual data with enhanced robustness and accuracy.
arXiv Detail & Related papers (2025-01-27T20:02:53Z)
Explainable AI-Enhanced Deep Learning for Pumpkin Leaf Disease Detection: A Comparative Analysis of CNN Architectures [1.472830326343432]
This study employs on the "Pumpkin Leaf Disease dataset", that comprises of 2000 high-resolution images separated into five categories.<n>The dataset was rigorously assembled from several agricultural fields to ensure a strong representation for model training.<n>We explored many proficient deep learning architectures, including DenseNet201, DenseNet121, DenseNet169, Xception, ResNet50, ResNet101 and InceptionResNetV2, and observed that ResNet50 performed most effectively, with an accuracy of 90.5% and comparable precision, recall, and F1-Score.
arXiv Detail & Related papers (2025-01-09T18:59:35Z)
Implementing Trust in Non-Small Cell Lung Cancer Diagnosis with a Conformalized Uncertainty-Aware AI Framework in Whole-Slide Images [37.3701890138561]
TRUECAM is a framework designed to ensure both data and model trustworthiness in non-small cell lung cancer subtyping with whole-slide images. An AI model wrapped with TRUECAM significantly outperforms models that lack such guidance, in terms of classification accuracy, robustness, interpretability, and data efficiency.
arXiv Detail & Related papers (2024-12-28T02:22:47Z)
Comparative Analysis of Multi-Omics Integration Using Advanced Graph Neural Networks for Cancer Classification [40.45049709820343]
Multi-omics data integration poses significant challenges due to the high dimensionality, data complexity, and distinct characteristics of various omics types. This study evaluates three graph neural network architectures for multi-omics (MO) integration based on graph-convolutional networks (GCN), graph-attention networks (GAT), and graph-transformer networks (GTN)
arXiv Detail & Related papers (2024-10-05T16:17:44Z)
Automated Disease Diagnosis in Pumpkin Plants Using Advanced CNN Models [0.0]
Pumpkin is a vital crop cultivated globally, and its productivity is crucial for food security, especially in developing regions. Recent advancements in machine learning and deep learning offer promising solutions for automating and improving the accuracy of plant disease detection. This paper presents a comprehensive analysis of state-of-the-art Convolutional Neural Network (CNN) models for classifying diseases in pumpkin plant leaves.
arXiv Detail & Related papers (2024-09-29T14:31:23Z)
Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2. While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z)
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness [102.06442250444618]
We introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm.<n> RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation.<n>Experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models.
arXiv Detail & Related papers (2024-05-27T14:37:01Z)
From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information [32.57246173437492]
Vision detection models excel at recognizing fine-grained image details.<n>One effective strategy is to infuse detection information in text format, which has proven simple and effective.<n>This paper addresses the question: How does training impact MLLMs' understanding of infused textual detection information?
arXiv Detail & Related papers (2024-01-31T16:38:32Z)
Machine Learning-Based Jamun Leaf Disease Detection: A Comprehensive Review [0.0]
Jamun leaf diseases pose a significant threat to agricultural productivity. The advent of machine learning has opened up new avenues for tackling these diseases effectively. Various automated systems have been implemented for similar types of disease detection using image processing techniques.
arXiv Detail & Related papers (2023-11-27T11:46:30Z)
Dual-Activated Lightweight Attention ResNet50 for Automatic Histopathology Breast Cancer Image Classification [0.0]
This study introduces a novel method for breast cancer classification, the Dual-Activated Lightweight Attention ResNet50 model. It integrates a pre-trained ResNet50 model with a lightweight attention mechanism, embedding an attention module in the fourth layer of ResNet50. The DALAResNet50 method was tested on breast cancer histopathology images from the BreakHis Database across magnification factors of 40X, 100X, 200X, and 400X, achieving accuracies of 98.5%, 98.7%, 97.9%, and 94.3%, respectively.
arXiv Detail & Related papers (2023-08-25T03:08:41Z)
Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT [11.623005206620498]
Plant diseases are the primary cause of crop losses globally, with an impact on the world economy. In this study, a Vision Transformer enabled Convolutional Neural Network model called "PlantXViT" is proposed for plant disease identification. The proposed model has a lightweight structure with only 0.8 million trainable parameters, which makes it suitable for IoT-based smart agriculture services.
arXiv Detail & Related papers (2022-07-16T12:05:06Z)
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC) We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z)
RetiNerveNet: Using Recursive Deep Learning to Estimate Pointwise 24-2 Visual Field Data based on Retinal Structure [109.33721060718392]
glaucoma is the leading cause of irreversible blindness in the world, affecting over 70 million people. Due to the Standard Automated Perimetry (SAP) test's innate difficulty and its high test-retest variability, we propose the RetiNerveNet.
arXiv Detail & Related papers (2020-10-15T03:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.