AgriDoctor: A Multimodal Intelligent Assistant for Agriculture
- URL: http://arxiv.org/abs/2509.17044v1
- Date: Sun, 21 Sep 2025 11:51:57 GMT
- Title: AgriDoctor: A Multimodal Intelligent Assistant for Agriculture
- Authors: Mingqing Zhang, Zhuoning Xu, Peijie Wang, Rongji Li, Liang Wang, Qiang Liu, Jian Xu, Xuyao Zhang, Shu Wu, Liang Wang,
- Abstract summary: AgriDoctor is a modular and multimodal framework designed for intelligent crop disease diagnosis and agricultural knowledge interaction.<n>To facilitate effective training and evaluation, we construct AgriMM, a benchmark comprising 400000 annotated disease images, 831 expert-curated knowledge entries, and 300000 bilingual prompts for intent-driven tool selection.<n>Experiments demonstrate that AgriDoctor, trained on AgriMM, significantly outperforms state-of-the-art LVLMs on fine-grained agricultural tasks.
- Score: 45.77373971125537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate crop disease diagnosis is essential for sustainable agriculture and global food security. Existing methods, which primarily rely on unimodal models such as image-based classifiers and object detectors, are limited in their ability to incorporate domain-specific agricultural knowledge and lack support for interactive, language-based understanding. Recent advances in large language models (LLMs) and large vision-language models (LVLMs) have opened new avenues for multimodal reasoning. However, their performance in agricultural contexts remains limited due to the absence of specialized datasets and insufficient domain adaptation. In this work, we propose AgriDoctor, a modular and extensible multimodal framework designed for intelligent crop disease diagnosis and agricultural knowledge interaction. As a pioneering effort to introduce agent-based multimodal reasoning into the agricultural domain, AgriDoctor offers a novel paradigm for building interactive and domain-adaptive crop health solutions. It integrates five core components: a router, classifier, detector, knowledge retriever and LLMs. To facilitate effective training and evaluation, we construct AgriMM, a comprehensive benchmark comprising 400000 annotated disease images, 831 expert-curated knowledge entries, and 300000 bilingual prompts for intent-driven tool selection. Extensive experiments demonstrate that AgriDoctor, trained on AgriMM, significantly outperforms state-of-the-art LVLMs on fine-grained agricultural tasks, establishing a new paradigm for intelligent and sustainable farming applications.
Related papers
- AgriGPT: a Large Language Model Ecosystem for Agriculture [16.497060004913806]
AgriGPT is a domain-specialized Large Language Models ecosystem for agriculture usage.<n>At its core, we design a scalable data engine that compiles credible data sources into Agri-342K, a high-quality, standardized question-answer dataset.<n>We employ Tri-RAG, a three-channel Retrieval-Augmented Generation framework combining dense retrieval, sparse retrieval, and multi-hop knowledge graph reasoning.
arXiv Detail & Related papers (2025-08-12T04:51:08Z) - AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock [77.95897723270453]
Crops, fisheries and livestock form the backbone of global food production, essential to feed the ever-growing global population.<n> Addressing these issues requires efficient, accurate, and scalable technological solutions, highlighting the importance of artificial intelligence (AI)<n>This survey presents a systematic and thorough review of more than 200 research works covering conventional machine learning approaches, advanced deep learning techniques, and recent vision-language foundation models.
arXiv Detail & Related papers (2025-07-29T17:59:48Z) - Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain [1.0144032120138065]
This study generates multilingual (English, Hindi, Punjabi) synthetic datasets from agriculture-specific documents from India.<n> Evaluation on human-created datasets demonstrates significant improvements in factuality, relevance, and agricultural consensus.
arXiv Detail & Related papers (2025-07-22T19:25:10Z) - OpenAg: Democratizing Agricultural Intelligence [0.0]
OpenAg is a comprehensive framework designed to advance agricultural artificial general intelligence (AGI)<n>It combines domain-specific foundation models, neural knowledge graphs, multi-agent reasoning, causal explainability, and adaptive transfer learning.<n>It aims to bridge the gap between scientific knowledge and the tacit expertise of experienced farmers to support scalable and locally relevant agricultural decision-making.
arXiv Detail & Related papers (2025-06-05T02:44:38Z) - Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making [32.62816270192696]
Modern agriculture faces dual challenges: optimizing production efficiency and achieving sustainable development.<n>To address these challenges, this study proposes an innovative textbfMultimodal textbfAgricultural textbfAgent textbfArchitecture (textbfMA3)<n>This study constructs a multimodal agricultural agent dataset encompassing five major tasks: classification, detection, Visual Question Answering (VQA), tool selection, and agent evaluation.
arXiv Detail & Related papers (2025-04-07T07:32:41Z) - A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis [5.006697347461899]
We present the crop disease domain multimodal dataset, a pioneering resource designed to advance the field of agricultural research.<n>The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge.<n>We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis.
arXiv Detail & Related papers (2025-03-10T06:37:42Z) - Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases [49.782064512495495]
We construct the first multimodal instruction-following dataset in the agricultural domain.<n>This dataset covers over 221 types of pests and diseases with approximately 400,000 data entries.<n>We propose a knowledge-infused training method to develop Agri-LLaVA, an agricultural multimodal conversation system.
arXiv Detail & Related papers (2024-12-03T04:34:23Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Opportunities [86.89427012495457]
We review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry.
We present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery.
We highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI.
arXiv Detail & Related papers (2023-05-03T05:16:54Z) - Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation [42.39035033967183]
Service robots need a real-time perception system that understands their surroundings and identifies their targets in the wild.
Existing methods, however, often fall short in generalizing to new crops and environmental conditions.
We propose a novel approach to enhance domain generalization using knowledge distillation.
arXiv Detail & Related papers (2023-04-03T14:28:29Z) - Agriculture-Vision: A Large Aerial Image Database for Agricultural
Pattern Analysis [110.30849704592592]
We present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
Each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel.
We annotate nine types of field anomaly patterns that are most important to farmers.
arXiv Detail & Related papers (2020-01-05T20:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.