Fugu-MT 論文翻訳(概要): Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts

論文の概要: Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts

arxiv url: http://arxiv.org/abs/2605.01882v1
Date: Sun, 03 May 2026 13:57:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.979187
Title: Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts
Title（参考訳）: Chart-FR1: ビジュアルフォーカス駆動の微粒化推論
Authors: Hongkun Pan, Yuwei Wu, Wanyi Hong, Shenghui Hu, Qitong Yan, Yi Yang, Rufei Han, Changju Zhou, Minfeng Zhu, Dongming Han, Wei Chen,
Abstract要約: 本稿では,HIDチャート上での認識,集中効率,適応的深部推論を改善するために,焦点駆動型微粒チャート推論モデルChart-FR1を提案する。具体的には,視覚的焦点の連鎖であるFocus-CoTを提案する。 HIDチャートのベンチマークのギャップを埋めるために、我々は、詳細なチャート推論機能を評価するために設計された、情報密度メトリックを備えた挑戦的なベンチマークであるHID-Chartを構築した。
参考スコア（独自算出の注目度）: 11.918727404835934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal large language models (MLLMs) have shown considerable potential in chart understanding and reasoning tasks. However, they still struggle with high information density (HID) charts characterized by multiple subplots, legends, and dense annotations due to three major challenges: (1) limited fine-grained perception results in the omission of critical visual cues; (2) redundant or noisy visual information undermines the performance of multimodal reasoning; (3) lack of adaptive deep reasoning relative to the amount of visual information. To tackle these challenges, we present a novel focus-driven fine-grained chart reasoning model, Chart-FR1, to improve perception, focusing efficiency, and adaptive deep reasoning on HID charts. Specifically, we propose Focus-CoT, a visual focusing chain-of-thought that enhances fine-grained perception by explicitly linking reasoning steps to key visual cues, such as local image regions and OCR signals. Building on this, we introduce Focus-GRPO, a focus-driven reinforcement learning algorithm with an information-efficiency reward that compresses redundant visual information for efficient focusing, and an adaptive KL penalty mechanism that enables flexible control over reasoning depth as more visual cues are discovered. Furthermore, to fill the gap in benchmarks for HID charts, we build HID-Chart, a challenging benchmark with an information-density metric designed to evaluate fine-grained chart reasoning capabilities. Extensive experiments on multiple chart benchmarks demonstrate that Chart-FR1 outperforms state-of-the-art MLLMs in chart understanding and reasoning. Code is available at https://github.com/phkhub/Chart-FR1.
Abstract（参考訳）: マルチモーダルな大言語モデル(MLLM)は、チャート理解と推論タスクにおいて大きな可能性を示している。しかし、3つの大きな課題により、(1)きめ細かな知覚が重要な視覚的手がかりの欠如を招き、(2)冗長またはノイズの多い視覚情報がマルチモーダル推論のパフォーマンスを損なうこと、(3)視覚情報量に対する適応的な深い推論が欠如すること、の3つにより、ハイ情報密度(HID)チャートに苦慮している。これらの課題に対処するため,HIDチャートにおける認識の向上,集中効率の向上,適応的深部推論のための,焦点駆動型微粒化チャート推論モデルであるChart-FR1を提案する。具体的には、局所画像領域やOCR信号などの重要な視覚的手がかりに推論ステップを明示的にリンクすることで、微妙な知覚を高める視覚的焦点連鎖であるFocus-CoTを提案する。そこで我々は,集中型強化学習アルゴリズムであるFocus-GRPOを導入する。このアルゴリズムは,効率的なフォーカスのために冗長な視覚情報を圧縮する情報効率の報奨と,より視覚的な手がかりが発見されるにつれて推論深度を柔軟に制御できる適応KLペナルティ機構である。さらに、HIDチャートのベンチマークのギャップを埋めるために、細粒度チャートの推論機能を評価するために設計された情報密度メトリックを備えた挑戦的なベンチマークであるHID-Chartを構築した。複数のチャートベンチマークにおいて、Chart-FR1は、チャートの理解と推論において最先端のMLLMよりも優れていることを示した。コードはhttps://github.com/phkhub/Chart-FR1.comから入手できる。

論文の概要: Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts

関連論文リスト