Fugu-MT 論文翻訳(概要): CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

論文の概要: CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

arxiv url: http://arxiv.org/abs/2604.02794v1
Date: Fri, 03 Apr 2026 07:02:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.366342
Title: CharTool: Tool-Integrated Visual Reasoning for Chart Understanding
Title（参考訳）: CharTool: チャート理解のためのツール統合ビジュアル推論
Authors: Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu,
Abstract要約: 合成チャートと実世界のチャートを組み合わせたスケーラブルなデュアルソースデータパイプラインであるDuoChartを提案する。次にCharToolを導入し、MLLMに画像トリミングや局所的な視覚認識、コードベースの計算など、外部ツールを組み込む。
参考スコア（独自算出の注目度）: 24.815732262963294
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method consistently improves over strong MLLM baselines across model scales. Notably, CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78%** on ChartQAPro, while achieving competitive performance with substantially larger or proprietary models. Moreover, CharTool demonstrates positive generalization to out-of-domain visual math reasoning benchmarks.
Abstract（参考訳）: チャートは構造化されたデータを提示するための科学文献や金融文献で広く使われている。しかし, マルチモーダル大規模言語モデル(MLLM)では, 高品質なトレーニングデータがないこと, きめ細かいビジュアルグラウンドや正確な数値計算の必要性から, チャート推論は依然として困難である。これらの課題に対処するために、我々はまずDuoChartを提案する。DuoChartはスケーラブルなデュアルソースデータパイプラインで、合成チャートと実世界のチャートを組み合わせて、多様な高品質なチャートトレーニングデータを構築する。次にCharToolを導入し,MLLMに局所的な視覚知覚のための画像トリミングや,正確な数値推論のためのコードベースの計算など,外部ツールを備えたMLLMを提案する。エージェントによるDuoChartの強化学習を通じて、CharToolはチャートの内容に根ざしたツール統合推論を学ぶ。 6つのチャートのベンチマーク実験により,提案手法はモデルスケールの強いMLLMベースラインよりも一貫した改善が得られた。 CharTool-7BはChartQAProでは**+8.0%*、ChartQAProでは**+9.78%*で、より大型またはプロプライエタリなモデルでは競争性能が向上している。さらにCharToolは、ドメイン外の視覚数学推論ベンチマークに肯定的な一般化を示す。

論文の概要: CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

関連論文リスト