Fugu-MT 論文翻訳(概要): DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

論文の概要: DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

arxiv url: http://arxiv.org/abs/2508.17398v1
Date: Sun, 24 Aug 2025 15:11:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.505324
Title: DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
Title（参考訳）: DashboardQA: 対話型ダッシュボードにおける質問応答のためのマルチモーダルエージェントのベンチマーク
Authors: Aaryaman Kartha, Ahmed Masry, Mohammed Saidul Islam, Thinh Lang, Shadikur Rahman, Ridwan Mahbub, Mizanur Rahman, Mahir Ahmed, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty,
Abstract要約: DashboardQAは、ビジュアル言語GUIエージェントが現実世界のダッシュボードをどのように理解し、相互作用するかを評価するために設計されたベンチマークである。 Tableau Publicから112のインタラクティブダッシュボードと、マルチ選択、ファクトイド、仮説、マルチダッシュボード、会話という5つのカテゴリにまたがる対話型ダッシュボードを備えた405の質問応答ペアが含まれている。この結果から, インタラクティブなダッシュボード推論は, 総合的に評価されるすべてのVLMにおいて難しい課題であることがわかった。
参考スコア（独自算出の注目度）: 44.69783955774917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dashboards are powerful visualization tools for data-driven decision-making, integrating multiple interactive views that allow users to explore, filter, and navigate data. Unlike static charts, dashboards support rich interactivity, which is essential for uncovering insights in real-world analytical workflows. However, existing question-answering benchmarks for data visualizations largely overlook this interactivity, focusing instead on static charts. This limitation severely constrains their ability to evaluate the capabilities of modern multimodal agents designed for GUI-based reasoning. To address this gap, we introduce DashboardQA, the first benchmark explicitly designed to assess how vision-language GUI agents comprehend and interact with real-world dashboards. The benchmark includes 112 interactive dashboards from Tableau Public and 405 question-answer pairs with interactive dashboards spanning five categories: multiple-choice, factoid, hypothetical, multi-dashboard, and conversational. By assessing a variety of leading closed- and open-source GUI agents, our analysis reveals their key limitations, particularly in grounding dashboard elements, planning interaction trajectories, and performing reasoning. Our findings indicate that interactive dashboard reasoning is a challenging task overall for all the VLMs evaluated. Even the top-performing agents struggle; for instance, the best agent based on Gemini-Pro-2.5 achieves only 38.69% accuracy, while the OpenAI CUA agent reaches just 22.69%, demonstrating the benchmark's significant difficulty. We release DashboardQA at https://github.com/vis-nlp/DashboardQA
Abstract（参考訳）: Dashboardは、データ駆動意思決定のための強力な視覚化ツールであり、複数のインタラクティブビューを統合して、データの探索、フィルタリング、ナビゲートを可能にする。静的チャートとは異なり、ダッシュボードはリッチな対話性をサポートしており、実際の分析ワークフローにおける洞察を明らかにするのに不可欠である。しかし、データビジュアライゼーションのための既存の質問答えベンチマークは、静的チャートではなく、この対話性を見落としている。この制限は、GUIベースの推論のために設計された現代のマルチモーダルエージェントの機能を評価する能力を厳しく制限する。このギャップに対処するため、我々はDashboardQAという、視覚言語GUIエージェントが現実世界のダッシュボードをどのように理解し、どのように相互作用するかを評価するために設計された最初のベンチマークを紹介した。ベンチマークには、Tableau Publicの112のインタラクティブダッシュボードと、マルチ選択、ファクトイド、仮説、マルチダッシュボード、会話という5つのカテゴリにまたがる対話型ダッシュボードを備えた405の質問回答ペアが含まれている。様々な主要なクローズドおよびオープンソースGUIエージェントを評価することで、特にダッシュボード要素の接地、インタラクショントラジェクトリの計画、推論の実行において重要な制限を明らかにします。この結果から, インタラクティブなダッシュボード推論は, 総合的に評価されるすべてのVLMにおいて難しい課題であることがわかった。例えば、Gemini-Pro-2.5に基づく最高のエージェントは38.69%の精度しか達成せず、OpenAI CUAエージェントは22.69%にしか達せず、ベンチマークの重大な難しさを示している。 DashboardQAはhttps://github.com/vis-nlp/DashboardQAでリリースしています。

論文の概要: DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

関連論文リスト