Agro-Consensus: Semantic Self-Consistency in Vision-Language Models for Crop Disease Management in Developing Countries
- URL: http://arxiv.org/abs/2510.21757v1
- Date: Sat, 11 Oct 2025 19:41:07 GMT
- Title: Agro-Consensus: Semantic Self-Consistency in Vision-Language Models for Crop Disease Management in Developing Countries
- Authors: Mihir Gupta, Pratik Desai, Ross Greer,
- Abstract summary: Agricultural disease management in developing countries faces significant challenges due to limited access to expert plant pathologists.<n>This work introduces a cost-effective self-consistency framework to improve vision-language model (VLM) reliability for agricultural image captioning.
- Score: 2.2727733134290813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agricultural disease management in developing countries such as India, Kenya, and Nigeria faces significant challenges due to limited access to expert plant pathologists, unreliable internet connectivity, and cost constraints that hinder the deployment of large-scale AI systems. This work introduces a cost-effective self-consistency framework to improve vision-language model (VLM) reliability for agricultural image captioning. The proposed method employs semantic clustering, using a lightweight (80MB) pre-trained embedding model to group multiple candidate responses. It then selects the most coherent caption -- containing a diagnosis, symptoms, analysis, treatment, and prevention recommendations -- through a cosine similarity-based consensus. A practical human-in-the-loop (HITL) component is incorporated, wherein user confirmation of the crop type filters erroneous generations, ensuring higher-quality input for the consensus mechanism. Applied to the publicly available PlantVillage dataset using a fine-tuned 3B-parameter PaliGemma model, our framework demonstrates improvements over standard decoding methods. Evaluated on 800 crop disease images with up to 21 generations per image, our single-cluster consensus method achieves a peak accuracy of 83.1% with 10 candidate generations, compared to the 77.5% baseline accuracy of greedy decoding. The framework's effectiveness is further demonstrated when considering multiple clusters; accuracy rises to 94.0% when a correct response is found within any of the top four candidate clusters, outperforming the 88.5% achieved by a top-4 selection from the baseline.
Related papers
- FUGC: Benchmarking Semi-Supervised Learning Methods for Cervical Segmentation [63.7829089874007]
This paper introduces the Fetal Ultrasound Grand Challenge (FUGC), the first benchmark for semi-supervised learning in cervical segmentation.<n>FUGC provides a dataset of 890 TVS images, including 500 training images, 90 validation images, and 300 test images.<n> Methods were evaluated using the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and runtime (RT), with a weighted combination of 0.4/0.4/0.2.
arXiv Detail & Related papers (2026-01-22T01:34:39Z) - A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z) - A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis [0.0]
We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups in Tunisia.<n> bootstrap stability analysis, performance metrics, and comparison with LDA baseline.<n>Findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora.
arXiv Detail & Related papers (2025-11-24T07:30:15Z) - Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning [0.0]
We propose a scalable, privacy-preserving federated learning framework for colorectal cancer grading.<n>Our framework achieves an overall accuracy of 83.5%, outperforming a comparable centralized model.<n>The proposed modular pipeline establishes a foundational step toward deployable, privacy-aware clinical AI for digital pathology.
arXiv Detail & Related papers (2025-11-05T18:18:09Z) - Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity [7.019137213828947]
This study tackles both issues through a diagnostic-driven, semi-supervised framework.<n>We use a unique dataset of approximately 975 labeled and 10,000 unlabeled images of Guinea Grass in sugarcane.<n>Our work provides a clear and field-tested framework for developing, diagnosing, and improving robust computer vision systems.
arXiv Detail & Related papers (2025-08-27T01:55:47Z) - Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management [0.0]
This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms.<n>We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images.<n>Our approach improves diagnostic accuracy from 82.2% to 87.8%, symptom analysis from 38.9% to 52.2%, and treatment recommendation from 27.8% to 43.3
arXiv Detail & Related papers (2025-07-08T18:32:21Z) - DeepSeqCoco: A Robust Mobile Friendly Deep Learning Model for Detection of Diseases in Cocos nucifera [0.0]
Coconut tree diseases are a serious risk to agricultural yield, particularly in developing countries.<n>DeepSeqCoco is a deep learning based model for accurate and automatic disease identification from coconut tree images.
arXiv Detail & Related papers (2025-05-15T07:25:43Z) - A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers [51.45596445363302]
GlobeReady is a clinician-friendly AI platform that enables fundus disease diagnosis without retraining, fine-tuning, or the needs for technical expertise.<n>We demonstrate high accuracy across imaging modalities: 93.9-98.5% for 11 fundus diseases using color fundus photographs (CPFs) and 87.2-92.7% for 15 fundus diseases using optic coherence tomography ( OCT) scans.<n>By leveraging training-free local feature augmentation, GlobeReady platform effectively mitigates domain shifts across centers and populations.
arXiv Detail & Related papers (2025-04-22T14:17:22Z) - Design and Implementation of FourCropNet: A CNN-Based System for Efficient Multi-Crop Disease Detection and Management [3.4161054453684705]
This study proposes FourCropNet, a novel deep learning model designed to detect diseases in multiple crops.<n>FourCropNet achieved the highest accuracy of 99.7% for Grape, 99.5% for Corn, and 95.3% for the combined dataset.
arXiv Detail & Related papers (2025-03-11T12:00:56Z) - A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.<n>KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment [54.93996119324928]
We create the largest AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K.
We conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception.
arXiv Detail & Related papers (2024-04-04T12:12:24Z) - Group-Conditional Conformal Prediction via Quantile Regression
Calibration for Crop and Weed Classification [0.0]
This article presents the conformal prediction framework that provides valid statistical guarantees on the predictive performance of any black box prediction machine.
The framework is exposed with a focus on its practical aspects and special attention accorded to the Adaptive Prediction Sets (APS) approach.
To tackle this shortcoming, group-conditional conformal approaches are presented.
arXiv Detail & Related papers (2023-08-29T08:02:41Z) - Generative models improve fairness of medical classifiers under
distribution shifts [49.10233060774818]
We show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models.
We demonstrate that these learned augmentations can surpass ones by making models more robust and statistically fair in- and out-of-distribution.
arXiv Detail & Related papers (2023-04-18T18:15:38Z) - Improving Visual Grounding by Encouraging Consistent Gradient-based
Explanations [58.442103936918805]
We show that Attention Mask Consistency produces superior visual grounding results than previous methods.
AMC is effective, easy to implement, and is general as it can be adopted by any vision-language model.
arXiv Detail & Related papers (2022-06-30T17:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.