Fugu-MT 論文翻訳(概要): ChartAB: A Benchmark for Chart Grounding & Dense Alignment

論文の概要: ChartAB: A Benchmark for Chart Grounding & Dense Alignment

arxiv url: http://arxiv.org/abs/2510.26781v1
Date: Thu, 30 Oct 2025 17:56:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.963589
Title: ChartAB: A Benchmark for Chart Grounding & Dense Alignment
Title（参考訳）: ChartAB: Chart Grounding and Dense Alignmentのベンチマーク
Authors: Aniruddh Bansal, Davit Soselia, Dang Nguyen, Tianyi Zhou,
Abstract要約: 視覚言語モデル(VLM)の包括的評価を提供する新しいChartAlign Benchmark(ChartAB)を導入する。新たな2段階推論ワークフローを導入することで、ベンチマークはVLMの2つのチャートにまたがる要素/属性を調整および比較する能力をさらに評価することができる。近年のVLM評価では, 認知バイアス, 弱さ, 頑健さ, 幻覚に対する新たな知見が得られた。
参考スコア（独自算出の注目度）: 17.16234793106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate perception of details and struggle to extract fine-grained structures from charts. Such limitations in chart grounding also hinder their ability to compare multiple charts and reason over them. In this paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, localizing visualization elements, and recognizing various attributes from charts of diverse types and complexities. We design a JSON template to facilitate the calculation of evaluation metrics specifically tailored for each grounding task. By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs' capability to align and compare elements/attributes across two charts. Our analysis of evaluations on several recent VLMs reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding. These findings highlight the fine-grained discrepancies among VLMs in chart understanding tasks and point to specific skills that need to be strengthened in current models.
Abstract（参考訳）: チャートは、可視化、推論、データ分析、人間間のアイデアの交換において重要な役割を果たす。しかし、既存の視覚言語モデル (VLM) は詳細を正確に認識しておらず、チャートから微細な構造を引き出すのに苦労している。チャートグラウンディングのこのような制限は、複数のチャートを比較したり、それらについて理屈を定めたりする能力を妨げている。本稿では,表表データの抽出,可視化要素のローカライズ,多種多様な種類や複雑さのチャートからの様々な属性の認識など,表在化タスクにおけるVLMの包括的評価を提供する,新しいChartAlign Benchmark(ChartAB)を提案する。我々は,各グラウンド処理に適した評価指標の計算を容易にするために,JSONテンプレートを設計する。新たな2段階推論ワークフローを導入することで、ベンチマークはVLMの2つのチャートにまたがる要素/属性を調整および比較する能力をさらに評価することができる。近年のVLM評価では, 認知バイアス, 弱さ, 頑健さ, 幻覚に対する新たな知見が得られた。これらの結果は、チャート理解タスクにおけるVLM間の微妙な相違点と、現在のモデルで強化される必要がある特定のスキルを指し示している。

論文の概要: ChartAB: A Benchmark for Chart Grounding & Dense Alignment

関連論文リスト