Fugu-MT 論文翻訳(概要): PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science

論文の概要: PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science

arxiv url: http://arxiv.org/abs/2508.17117v1
Date: Sat, 23 Aug 2025 19:04:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.34863
Title: PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
Title（参考訳）: PlantVillageVQA: 植物科学におけるビジョンランゲージモデルベンチマークのための視覚的質問応答データセット
Authors: Syed Nazmus Sakib, Nafiul Haque, Mohammad Zabed Hossain, Shifat E. Arman,
Abstract要約: PlantVillageVQAデータセットは、55,448枚以上の画像に基づいて、高品質なQA(QA)ペア193,609枚で構成されている。データセットは、科学的正確性と関連性に関して、ドメインの専門家によって反復的にレビューされた。本研究の目的は,植物病の診断精度を高めるために,公開され,標準化され,専門家が検証したデータベースを提供することである。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: PlantVillageVQA is a large-scale visual question answering (VQA) dataset derived from the widely used PlantVillage image corpus. It was designed to advance the development and evaluation of vision-language models for agricultural decision-making and analysis. The PlantVillageVQA dataset comprises 193,609 high-quality question-answer (QA) pairs grounded over 55,448 images spanning 14 crop species and 38 disease conditions. Questions are organised into 3 levels of cognitive complexity and 9 distinct categories. Each question category was phrased manually following expert guidance and generated via an automated two-stage pipeline: (1) template-based QA synthesis from image metadata and (2) multi-stage linguistic re-engineering. The dataset was iteratively reviewed by domain experts for scientific accuracy and relevancy. The final dataset was evaluated using three state-of-the-art models for quality assessment. Our objective remains to provide a publicly available, standardised and expert-verified database to enhance diagnostic accuracy for plant disease identifications and advance scientific research in the agricultural domain. Our dataset will be open-sourced at https://huggingface.co/datasets/SyedNazmusSakib/PlantVillageVQA.
Abstract（参考訳）: PlantVillageVQAは、広く使用されているPlanVillageイメージコーパスから派生した大規模な視覚的質問応答(VQA)データセットである。農業意思決定・分析のための視覚言語モデルの開発と評価を促進するために設計された。 PlantVillageVQAデータセットは、高品質なQA(QA)ペア193,609枚からなる。質問は3段階の認知複雑性と9つの異なるカテゴリに分けられる。 1) 画像メタデータからのテンプレートベースのQA合成, (2) 多段階言語再設計。データセットは、科学的正確性と関連性に関して、ドメインの専門家によって反復的にレビューされた。最終データセットは、品質評価のための3つの最先端モデルを用いて評価された。本研究の目的は, 植物病の診断精度を高め, 農業分野での科学的研究を進めるために, 公開され, 標準化され, 専門家が検証したデータベースを提供することである。私たちのデータセットはhttps://huggingface.co/datasets/SyedNazmusSakib/PlantVillageVQAでオープンソース化されます。

論文の概要: PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science

関連論文リスト