AgriGPT: a Large Language Model Ecosystem for Agriculture
- URL: http://arxiv.org/abs/2508.08632v1
- Date: Tue, 12 Aug 2025 04:51:08 GMT
- Title: AgriGPT: a Large Language Model Ecosystem for Agriculture
- Authors: Bo Yang, Yu Zhang, Lanfei Feng, Yunkui Chen, Jianyu Zhang, Xiao Xu, Nueraili Aierken, Yurui Li, Yuxuan Chen, Guijun Yang, Yong He, Runhe Huang, Shijian Li,
- Abstract summary: AgriGPT is a domain-specialized Large Language Models ecosystem for agriculture usage.<n>At its core, we design a scalable data engine that compiles credible data sources into Agri-342K, a high-quality, standardized question-answer dataset.<n>We employ Tri-RAG, a three-channel Retrieval-Augmented Generation framework combining dense retrieval, sparse retrieval, and multi-hop knowledge graph reasoning.
- Score: 16.497060004913806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the rapid progress of Large Language Models (LLMs), their application in agriculture remains limited due to the lack of domain-specific models, curated datasets, and robust evaluation frameworks. To address these challenges, we propose AgriGPT, a domain-specialized LLM ecosystem for agricultural usage. At its core, we design a multi-agent scalable data engine that systematically compiles credible data sources into Agri-342K, a high-quality, standardized question-answer (QA) dataset. Trained on this dataset, AgriGPT supports a broad range of agricultural stakeholders, from practitioners to policy-makers. To enhance factual grounding, we employ Tri-RAG, a three-channel Retrieval-Augmented Generation framework combining dense retrieval, sparse retrieval, and multi-hop knowledge graph reasoning, thereby improving the LLM's reasoning reliability. For comprehensive evaluation, we introduce AgriBench-13K, a benchmark suite comprising 13 tasks with varying types and complexities. Experiments demonstrate that AgriGPT significantly outperforms general-purpose LLMs on both domain adaptation and reasoning. Beyond the model itself, AgriGPT represents a modular and extensible LLM ecosystem for agriculture, comprising structured data construction, retrieval-enhanced generation, and domain-specific evaluation. This work provides a generalizable framework for developing scientific and industry-specialized LLMs. All models, datasets, and code will be released to empower agricultural communities, especially in underserved regions, and to promote open, impactful research.
Related papers
- AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents [17.904008870689964]
We present a Python execution environment, AgriWorld, exposing unified tools for queries over field parcels, remote-sensing time-series analytics, crop growth simulation, and task-specific predictors (e.g. yield, stress, and disease risk)<n>On top of this environment, we design a multi-turn AgroReflective agent, that iteratively writes code, observes execution results, and refines its analysis via an execute-observe-refine loop.
arXiv Detail & Related papers (2026-02-17T03:12:57Z) - AgriDoctor: A Multimodal Intelligent Assistant for Agriculture [45.77373971125537]
AgriDoctor is a modular and multimodal framework designed for intelligent crop disease diagnosis and agricultural knowledge interaction.<n>To facilitate effective training and evaluation, we construct AgriMM, a benchmark comprising 400000 annotated disease images, 831 expert-curated knowledge entries, and 300000 bilingual prompts for intent-driven tool selection.<n>Experiments demonstrate that AgriDoctor, trained on AgriMM, significantly outperforms state-of-the-art LVLMs on fine-grained agricultural tasks.
arXiv Detail & Related papers (2025-09-21T11:51:57Z) - AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock [77.95897723270453]
Crops, fisheries and livestock form the backbone of global food production, essential to feed the ever-growing global population.<n> Addressing these issues requires efficient, accurate, and scalable technological solutions, highlighting the importance of artificial intelligence (AI)<n>This survey presents a systematic and thorough review of more than 200 research works covering conventional machine learning approaches, advanced deep learning techniques, and recent vision-language foundation models.
arXiv Detail & Related papers (2025-07-29T17:59:48Z) - AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models [19.265932725554833]
We propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics.<n>AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios.<n>AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date.
arXiv Detail & Related papers (2025-07-29T12:58:27Z) - Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain [1.0144032120138065]
This study generates multilingual (English, Hindi, Punjabi) synthetic datasets from agriculture-specific documents from India.<n> Evaluation on human-created datasets demonstrates significant improvements in factuality, relevance, and agricultural consensus.
arXiv Detail & Related papers (2025-07-22T19:25:10Z) - Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases [49.782064512495495]
We construct the first multimodal instruction-following dataset in the agricultural domain.<n>This dataset covers over 221 types of pests and diseases with approximately 400,000 data entries.<n>We propose a knowledge-infused training method to develop Agri-LLaVA, an agricultural multimodal conversation system.
arXiv Detail & Related papers (2024-12-03T04:34:23Z) - AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models [4.12825661607328]
AgriBench is the first benchmark designed to evaluate MultiModal Large Language Models (MM-LLMs) for agriculture applications.<n>We propose MM-LUCAS, a multimodal agriculture dataset that includes 1,784 landscape images, segmentation masks, depth maps, and detailed annotations.<n>This work presents a groundbreaking perspective in advancing agriculture MM-LLMs and is still in progress, offering valuable insights for future developments and innovations in specific expert knowledge-based MM-LLMs.
arXiv Detail & Related papers (2024-11-30T12:59:03Z) - SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains [45.349645606978434]
Retrieval-augmented generation (RAG) enhances the question-answering abilities of large language models (LLMs)<n>We propose SimRAG, a self-training approach that equips the LLM with joint capabilities of question answering and question generation for domain adaptation.<n> Experiments on 11 datasets, spanning two backbone sizes and three domains, demonstrate that SimRAG outperforms baselines by 1.2%--8.6%.
arXiv Detail & Related papers (2024-10-23T15:24:16Z) - AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning [30.034193330398292]
We propose an approach to construct instruction-tuning data that harnesses vision-only data for the agriculture domain.<n>We utilize diverse agricultural datasets spanning multiple domains, curate class-specific information, and employ large language models (LLMs) to construct an expert-tuning set.<n>We expert-tuned and created AgroGPT, an efficient LMM that can hold complex agriculture-related conversations and provide useful insights.
arXiv Detail & Related papers (2024-10-10T22:38:26Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation [42.39035033967183]
Service robots need a real-time perception system that understands their surroundings and identifies their targets in the wild.
Existing methods, however, often fall short in generalizing to new crops and environmental conditions.
We propose a novel approach to enhance domain generalization using knowledge distillation.
arXiv Detail & Related papers (2023-04-03T14:28:29Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.