Related papers: Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management

Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management

URL: http://arxiv.org/abs/2507.08024v1
Date: Tue, 08 Jul 2025 18:32:21 GMT
Title: Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management
Authors: Mihir Gupta, Abhay Mangla, Ross Greer, Pratik Desai,
Abstract summary: This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms.<n>We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images.<n>Our approach improves diagnostic accuracy from 82.2% to 87.8%, symptom analysis from 38.9% to 52.2%, and treatment recommendation from 27.8% to 43.3
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precision agriculture relies heavily on accurate image analysis for crop disease identification and treatment recommendation, yet existing vision-language models (VLMs) often underperform in specialized agricultural domains. This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms to enhance VLM reliability in precision agriculture applications. We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images and selects the most semantically coherent diagnosis using domain-adapted embeddings. Applied to maize leaf disease identification from field images using a fine-tuned PaliGemma model, our approach improves diagnostic accuracy from 82.2\% to 87.8\%, symptom analysis from 38.9\% to 52.2\%, and treatment recommendation from 27.8\% to 43.3\% compared to standard greedy decoding. The system remains compact enough for deployment on mobile devices, supporting real-time agricultural decision-making in resource-constrained environments. These results demonstrate significant potential for AI-driven precision agriculture tools that can operate reliably in diverse field conditions.

Related papers

Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning [22.34625628938106]
We propose textbfAgri-R1, a reasoning-enhanced large model for agriculture.<n>Our framework high-quality reasoning data generation via vision-language synthesis and LLM-based filtering.<n>We show a +23.2% relative gain in disease recognition accuracy, +33.3% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization.
arXiv Detail & Related papers (2026-01-08T07:34:37Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Explainable AI for Diabetic Retinopathy Detection Using Deep Learning with Attention Mechanisms and Fuzzy Logic-Based Interpretability [0.0]
This paper proposes a hybrid deep learning framework recipe for weed detection.<n>A Generative Adversarial Network (GAN)-based augmentation method was imposed to balance class robustness and better generalize the model.<n> Experimental results yield superior results with 99.33% accuracy, precision, recall, and F1-score on multi-benchmark datasets.
arXiv Detail & Related papers (2025-11-20T12:17:00Z)
Agro-Consensus: Semantic Self-Consistency in Vision-Language Models for Crop Disease Management in Developing Countries [2.2727733134290813]
Agricultural disease management in developing countries faces significant challenges due to limited access to expert plant pathologists.<n>This work introduces a cost-effective self-consistency framework to improve vision-language model (VLM) reliability for agricultural image captioning.
arXiv Detail & Related papers (2025-10-11T19:41:07Z)
AgriDoctor: A Multimodal Intelligent Assistant for Agriculture [45.77373971125537]
AgriDoctor is a modular and multimodal framework designed for intelligent crop disease diagnosis and agricultural knowledge interaction.<n>To facilitate effective training and evaluation, we construct AgriMM, a benchmark comprising 400000 annotated disease images, 831 expert-curated knowledge entries, and 300000 bilingual prompts for intent-driven tool selection.<n>Experiments demonstrate that AgriDoctor, trained on AgriMM, significantly outperforms state-of-the-art LVLMs on fine-grained agricultural tasks.
arXiv Detail & Related papers (2025-09-21T11:51:57Z)
Automated Multi-Class Crop Pathology Classification via Convolutional Neural Networks: A Deep Learning Approach for Real-Time Precision Agriculture [0.0]
This research introduces a Convolutional Neural Network (CNN)-based image classification system designed to automate the detection and classification of eight common crop diseases.<n>The solution is deployed on an open-source, mobile-compatible platform, enabling real-time image-based diagnostics for farmers in remote areas.
arXiv Detail & Related papers (2025-07-12T18:45:50Z)
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics.<n>FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights.<n>FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z)
Design and Implementation of FourCropNet: A CNN-Based System for Efficient Multi-Crop Disease Detection and Management [3.4161054453684705]
This study proposes FourCropNet, a novel deep learning model designed to detect diseases in multiple crops.<n>FourCropNet achieved the highest accuracy of 99.7% for Grape, 99.5% for Corn, and 95.3% for the combined dataset.
arXiv Detail & Related papers (2025-03-11T12:00:56Z)
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis [5.006697347461899]
We present the crop disease domain multimodal dataset, a pioneering resource designed to advance the field of agricultural research.<n>The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge.<n>We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis.
arXiv Detail & Related papers (2025-03-10T06:37:42Z)
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback [1.5839621757142595]
We propose a novel framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports.<n>By comparing features between the original and generated images, we introduce a dual-scoring system.<n>This approach significantly outperforms existing methods, achieving state-of-the-art results in pathology localization and text-to-image alignment.
arXiv Detail & Related papers (2025-01-29T16:02:16Z)
Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z)
Crop Disease Classification using Support Vector Machines with Green Chromatic Coordinate (GCC) and Attention based feature extraction for IoT based Smart Agricultural Applications [0.0]
Plant diseases can negatively affect leaves during agricultural cultivation, resulting in significant losses in crop output and economic value. Various machine learning (ML) as well as deep learning (DL) algorithms have been created & studied for the identification of plant disease detection. This article presents a novel classification method that builds on prior work by utilising attention-based feature extraction, RGB channel-based chromatic analysis, Support Vector Machines (SVM) for improved performance.
arXiv Detail & Related papers (2023-11-01T10:44:49Z)
Generative AI in Agriculture: Creating Image Datasets Using DALL.E's Advanced Large Language Model Capabilities [0.4143603294943439]
The study used both approaches of image generation: text-to-image and image-to-image (variation)<n>The image-to-image generation exhibited a 5.78% increase in average PSNR over text-to-image methods, signifying superior image clarity and quality.<n>Human evaluation also showed that images generated using image-to-image-based method were more realistic compared to those generated with text-to-image approach.
arXiv Detail & Related papers (2023-07-17T19:17:10Z)
Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation [42.39035033967183]
Service robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Existing methods, however, often fall short in generalizing to new crops and environmental conditions. We propose a novel approach to enhance domain generalization using knowledge distillation.
arXiv Detail & Related papers (2023-04-03T14:28:29Z)
End-to-end deep learning for directly estimating grape yield from ground-based imagery [53.086864957064876]
This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Three model architectures were tested: object detection, CNN regression, and transformer models. The study showed the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale.
arXiv Detail & Related papers (2022-08-04T01:34:46Z)
Unsupervised deep learning techniques for powdery mildew recognition based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves. We focus on unsupervised deep learning techniques applied to multispectral imaging data. We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z)
An Interpretable Multiple-Instance Approach for the Detection of referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images. By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy. We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
Estimating Crop Primary Productivity with Sentinel-2 and Landsat 8 using Machine Learning Methods Trained with Radiative Transfer Simulations [58.17039841385472]
We take advantage of all parallel developments in mechanistic modeling and satellite data availability for advanced monitoring of crop productivity. Our model successfully estimates gross primary productivity across a variety of C3 crop types and environmental conditions even though it does not use any local information from the corresponding sites. This highlights its potential to map crop productivity from new satellite sensors at a global scale with the help of current Earth observation cloud computing platforms.
arXiv Detail & Related papers (2020-12-07T16:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.