A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
- URL: http://arxiv.org/abs/2503.06973v1
- Date: Mon, 10 Mar 2025 06:37:42 GMT
- Title: A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
- Authors: Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, Shiguo Lian,
- Abstract summary: We present the crop disease domain multimodal dataset, a pioneering resource designed to advance the field of agricultural research.<n>The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge.<n>We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis.
- Score: 5.006697347461899
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While conversational generative AI has shown considerable potential in enhancing decision-making for agricultural professionals, its exploration has predominantly been anchored in text-based interactions. The evolution of multimodal conversational AI, leveraging vast amounts of image-text data from diverse sources, marks a significant stride forward. However, the application of such advanced vision-language models in the agricultural domain, particularly for crop disease diagnosis, remains underexplored. In this work, we present the crop disease domain multimodal (CDDM) dataset, a pioneering resource designed to advance the field of agricultural research through the application of multimodal learning techniques. The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge, from disease identification to management practices. By integrating visual and textual data, CDDM facilitates the development of sophisticated question-answering systems capable of providing precise, useful advice to farmers and agricultural professionals. We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis. Specifically, we employed a novel finetuning strategy that utilizes low-rank adaptation (LoRA) to finetune the visual encoder, adapter and language model simultaneously. Our contributions include not only the dataset but also a finetuning strategy and a benchmark to stimulate further research in agricultural technology, aiming to bridge the gap between advanced AI techniques and practical agricultural applications. The dataset is available at https: //github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench.
Related papers
- Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making [32.62816270192696]
Modern agriculture faces dual challenges: optimizing production efficiency and achieving sustainable development.
To address these challenges, this study proposes an innovative textbfMultimodal textbfAgricultural textbfAgent textbfArchitecture (textbfMA3)
This study constructs a multimodal agricultural agent dataset encompassing five major tasks: classification, detection, Visual Question Answering (VQA), tool selection, and agent evaluation.
arXiv Detail & Related papers (2025-04-07T07:32:41Z) - Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases [49.782064512495495]
We construct the first multimodal instruction-following dataset in the agricultural domain.<n>This dataset covers over 221 types of pests and diseases with approximately 400,000 data entries.<n>We propose a knowledge-infused training method to develop Agri-LLaVA, an agricultural multimodal conversation system.
arXiv Detail & Related papers (2024-12-03T04:34:23Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.
Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.
We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning [30.034193330398292]
We propose an approach to construct instruction-tuning data that harnesses vision-only data for the agriculture domain.<n>We utilize diverse agricultural datasets spanning multiple domains, curate class-specific information, and employ large language models (LLMs) to construct an expert-tuning set.<n>We expert-tuned and created AgroGPT, an efficient LMM that can hold complex agriculture-related conversations and provide useful insights.
arXiv Detail & Related papers (2024-10-10T22:38:26Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases.
Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - Information Fusion in Smart Agriculture: Machine Learning Applications and Future Research Directions [6.060623947643556]
Review focuses on how machine learning (ML) techniques, combined with multi-source data fusion, enhance precision agriculture by improving predictive accuracy and decision-making.
This review bridges the gap between AI research and agricultural applications, offering a roadmap for researchers, industry professionals, and policymakers to harness information fusion and ML for advancing precision agriculture.
arXiv Detail & Related papers (2024-05-23T17:53:31Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - Explainable AI in Grassland Monitoring: Enhancing Model Performance and
Domain Adaptability [0.6131022957085438]
Grasslands are known for their high biodiversity and ability to provide multiple ecosystem services.
Challenges in automating the identification of indicator plants are key obstacles to large-scale grassland monitoring.
This paper delves into the latter two challenges, with a specific focus on transfer learning and XAI approaches to grassland monitoring.
arXiv Detail & Related papers (2023-12-13T10:17:48Z) - Agriculture-Vision: A Large Aerial Image Database for Agricultural
Pattern Analysis [110.30849704592592]
We present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
Each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel.
We annotate nine types of field anomaly patterns that are most important to farmers.
arXiv Detail & Related papers (2020-01-05T20:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.