Related papers: AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

URL: http://arxiv.org/abs/2507.21773v1
Date: Tue, 29 Jul 2025 12:58:27 GMT
Title: AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models
Authors: Lian Yan, Haotian Wang, Chen Tang, Haifeng Liu, Tianyang Sun, Liangliang Liu, Yi Guan, Jingchi Jiang,
Abstract summary: We propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics.<n>AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios.<n>AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date.
Score: 19.265932725554833
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) Comprehensive Capability Evaluation. AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios: memorization, understanding, inference, and generation. (2) High-Quality Data. The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) Diverse Formats and Extensive Scale. AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date. We also present comprehensive experimental results over 51 open-source and commercial LLMs. The experimental results reveal that most existing LLMs struggle to achieve 60% accuracy, underscoring the developmental potential in agricultural LLMs. Additionally, we conduct extensive experiments to investigate factors influencing model performance and propose strategies for enhancement. AgriEval is available at https://github.com/YanPioneer/AgriEval/.

Related papers

AI in Agriculture: A Survey of Deep Learning Techniques for Crops, Fisheries and Livestock [77.95897723270453]
Crops, fisheries and livestock form the backbone of global food production, essential to feed the ever-growing global population.<n> Addressing these issues requires efficient, accurate, and scalable technological solutions, highlighting the importance of artificial intelligence (AI)<n>This survey presents a systematic and thorough review of more than 200 research works covering conventional machine learning approaches, advanced deep learning techniques, and recent vision-language foundation models.
arXiv Detail & Related papers (2025-07-29T17:59:48Z)
AgroBench: Vision-Language Model Benchmark in Agriculture [25.52955831089068]
We introduce AgroBench, a benchmark for evaluating vision-language models (VLMs) across seven agricultural topics.<n>Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories, to thoroughly evaluate VLM capabilities.
arXiv Detail & Related papers (2025-07-28T04:58:29Z)
Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain [1.0144032120138065]
Large language models (LLMs) in agriculture typically offer generic advisories, lacking precision in local and multilingual contexts.<n>This study generates multilingual synthetic agricultural datasets (English, Hindi, Punjabi) from agriculture-specific documents and fine-tuning language-specific LLMs.<n>Our evaluation on curated multilingual datasets demonstrates significant improvements in factual accuracy, relevance, and agricultural consensus.
arXiv Detail & Related papers (2025-07-22T19:25:10Z)
AgriCHN: A Comprehensive Cross-domain Resource for Chinese Agricultural Named Entity Recognition [30.51577375197722]
We present AgriCHN, a comprehensive open-source Chinese resource designed to promote the accuracy of automated agricultural entity annotation.<n>The dataset has been meticulously curated from a wealth of agricultural articles, comprising a total of 4,040 sentences and encapsulating 15,799 agricultural entity mentions.<n>A benchmark task has also been constructed using several state-of-the-art neural NER models.
arXiv Detail & Related papers (2025-06-21T04:21:11Z)
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey [49.1574468325115]
We conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations.<n>We provide detailed overviews within each category and highlight challenges in this field.<n>We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field.
arXiv Detail & Related papers (2025-05-21T19:17:29Z)
AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application [1.9643850583333375]
AgroLLM is designed to enhance knowledge-sharing and education in agriculture using Large Language Models (LLMs) and a Retrieval-Augmented Generation (RAG) framework.<n>A comparative study of three advanced models was conducted to evaluate performance across four key agricultural domains.<n>ChatGPT-4o Mini with RAG achieved the highest accuracy at 93%.
arXiv Detail & Related papers (2025-02-28T04:13:18Z)
Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases [49.782064512495495]
We construct the first multimodal instruction-following dataset in the agricultural domain.<n>This dataset covers over 221 types of pests and diseases with approximately 400,000 data entries.<n>We propose a knowledge-infused training method to develop Agri-LLaVA, an agricultural multimodal conversation system.
arXiv Detail & Related papers (2024-12-03T04:34:23Z)
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models [4.12825661607328]
AgriBench is the first benchmark designed to evaluate MultiModal Large Language Models (MM-LLMs) for agriculture applications.<n>We propose MM-LUCAS, a multimodal agriculture dataset that includes 1,784 landscape images, segmentation masks, depth maps, and detailed annotations.<n>This work presents a groundbreaking perspective in advancing agriculture MM-LLMs and is still in progress, offering valuable insights for future developments and innovations in specific expert knowledge-based MM-LLMs.
arXiv Detail & Related papers (2024-11-30T12:59:03Z)
Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z)
GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models [1.3999521658236698]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains. We present a comprehensive evaluation of popular LLMs, such as Llama 2 and GPT, on their ability to answer agriculture-related questions. We selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA.
arXiv Detail & Related papers (2023-10-10T00:39:04Z)
Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Opportunities [86.89427012495457]
We review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry. We present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery. We highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI.
arXiv Detail & Related papers (2023-05-03T05:16:54Z)
Jalisco's multiclass land cover analysis and classification using a novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis. In this work, we combine three real-world open data sources to obtain 13 channels. Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.