Fugu-MT 論文翻訳(概要): Do MLLMs Really Understand the Charts?

論文の概要: Do MLLMs Really Understand the Charts?

arxiv url: http://arxiv.org/abs/2509.04457v1
Date: Wed, 27 Aug 2025 09:17:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-14 20:41:04.877161
Title: Do MLLMs Really Understand the Charts?
Title（参考訳）: MLLMはチャートを本当に理解しているか?
Authors: Xiao Zhang, Dongyuan Li, Liuyu Xiang, Yao Zhang, Cheng Zhong, Zhaofeng He,
Abstract要約: MLLMは、チャートの解釈を推論するのではなく、主に認識に依存している、と我々は主張する。 MLLMを合理的なチャート理解に活用するために、チャート理解においてその推定を基礎にして人間の振る舞いを模倣するChartReasonerを提案する。
参考スコア（独自算出の注目度）: 30.848420807347896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Multimodal Large Language Models (MLLMs) have demonstrated increasingly impressive performance in chart understanding, most of them exhibit alarming hallucinations and significant performance degradation when handling non-annotated charts. Therefore, a question arises: Do MLLMs really understand the charts? Since a human is capable of understanding charts and estimating the values by visual reasoning, we first carefully establish a comprehensive Chart Reasoning Benchmark CRBench to rigorously evaluate the visual reasoning abilities of MLLMs on non-annotated charts. We argue that MLLMs are primarily relying on recognition rather than reasoning to interpret the charts. To steer MLLMs to reasonable chart understanding, we propose ChartReasoner that mimics human behavior by grounding their estimation in chart understanding. Extensive results on the proposed CRBench show that ChartReasnoner-3B/7B achieves superior performance in chart reasoning, even compared to GPT-4o and Gemini-2.5-Flash. More importantly, ChartReasnoner also demonstrates the visual reasoning abilities in general chart comprehension on public benchmarks, leading to significant performance gains and enabling MLLMs to rationally understand the charts. The code and dataset will be publicly available upon publication.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、チャート理解においてますます顕著なパフォーマンスを示しているが、それらの多くは、注意深い幻覚と、注釈のないチャートを扱う際の顕著なパフォーマンス劣化を示している。 MLLMはチャートを本当に理解していますか? そこで我々はまず,非注釈チャート上でMLLMの視覚的推論能力を厳格に評価するための総合的なチャート推論ベンチマークCRBenchを確立する。 MLLMは、チャートの解釈を推論するのではなく、主に認識に依存している、と我々は主張する。 MLLMを合理的なチャート理解に活用するために、チャート理解においてその推定を基礎にして人間の振る舞いを模倣するChartReasonerを提案する。 GPT-4o や Gemini-2.5-Flash と比較しても,ChartReasnoner-3B/7B はグラフ推論において優れていた。さらに重要なこととして、ChartReasnoner氏は、公開ベンチマークの一般的なチャート理解における視覚的推論能力も示している。コードとデータセットは公開時に公開される。

論文の概要: Do MLLMs Really Understand the Charts?

関連論文リスト