Fugu-MT 論文翻訳(概要): DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

論文の概要: DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

arxiv url: http://arxiv.org/abs/2604.25231v1
Date: Tue, 28 Apr 2026 05:24:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.721918
Title: DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
Title（参考訳）: DRAGON: ダイアグラム上のエビデンスを取り巻くビジュアル推論のベンチマーク
Authors: Anirudh Iyengar Kaniyar Narayana Iyengar, Tampu Ravi Kumar, Gaurav Najpande, Manan Suri, Dinesh Manocha, Puneet Mathur, Vivek Gupta,
Abstract要約: ダイアグラムにおけるエビデンスグラウンドの視覚的推論を評価するためのベンチマークであるDRAGONを紹介する。ダイアグラム、質問、そして正しい答えが与えられた場合、モデルは答えを正当化するために必要な視覚的要素に対応する境界ボックスを予測する必要がある。 DRAGONデータセットには、6つの図QAデータセットから収集された11,664の注釈付き質問インスタンスが含まれている。
参考スコア（独自算出の注目度）: 54.39165467997251
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct answers do not guarantee that models ground their reasoning in the diagram regions that support the prediction. Models may instead rely on textual correlations or dataset artifacts without identifying the visual evidence required to verify the answer. This limitation prevents reliable evaluation of diagram reasoning and reduces interpretability. We introduce DRAGON, a benchmark for evaluating evidence-grounded visual reasoning in diagrams. Given a diagram, a question, and the correct answer, a model must predict bounding boxes that correspond to the visual elements required to justify the answer. These evidence regions may include answer-bearing components, textual labels, legends, axes, connectors, and other supporting structures involved in the reasoning process. The DRAGON dataset contains 11,664 annotated question instances collected from six diagram QA datasets: ChartQA, Circuit-VQA, InfographicsVQA, MapIQ, MapWise, and AI2D. We release a 2,445-instance benchmark test set with human-verified reasoning evidence annotations and a standardized evaluation framework. We evaluate eight recent VLMs and analyze their ability to localize reasoning evidence across diverse diagram domains. DRAGON enables systematic evaluation of diagram reasoning and supports future research on models that ground their predictions in visual evidence.
Abstract（参考訳）: ダイアグラム質問応答(DQA)は、チャート、地図、インフォグラフィック、回路図、科学図のような構造化された視覚表現を解釈するモデルを必要とする。最近の視覚言語モデル(VLM)は、これらのタスクに対して高い解答精度を達成することが多いが、正しい答えは、予測をサポートするダイアグラム領域において、モデルが推論を根拠にしていることを保証するものではない。モデルは代わりに、答えを検証するのに必要な視覚的証拠を特定することなく、テキストの相関やデータセットのアーティファクトに依存する。この制限はダイアグラム推論の信頼性評価を防ぎ、解釈可能性を低減する。ダイアグラムにおけるエビデンスグラウンドの視覚的推論を評価するためのベンチマークであるDRAGONを紹介する。ダイアグラム、質問、そして正しい答えが与えられた場合、モデルは答えを正当化するために必要な視覚的要素に対応する境界ボックスを予測する必要がある。これらのエビデンス領域には、答えを持つコンポーネント、テキストラベル、伝説、軸、コネクタ、その他の推論プロセスに関連するサポート構造が含まれる。 DRAGONデータセットには、ChartQA、Circuit-VQA、InfographicsVQA、MapIQ、MapWise、AI2Dの6つの図QAデータセットから収集された11,664の注釈付き質問インスタンスが含まれている。人間の検証された推論のエビデンスアノテーションと標準化された評価フレームワークを備えた2,445-instanceベンチマークセットをリリースする。我々は,近年の8つのVLMを評価し,様々な図形領域にまたがる論理的証拠をローカライズする能力を解析した。 DRAGONはダイアグラム推論の体系的な評価を可能にし、その予測を視覚的証拠に基礎づけるモデルに関する将来の研究を支援する。

論文の概要: DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

関連論文リスト