Fugu-MT 論文翻訳(概要): Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

論文の概要: Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

arxiv url: http://arxiv.org/abs/2604.03157v1
Date: Fri, 03 Apr 2026 16:28:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.53447
Title: Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
Title（参考訳）: Chart-RL: 視覚言語モデルを用いたチャート質問応答における強化視覚推論のためのポリシー最適化強化学習
Authors: Yunfei Bai, Amit Dhanda, Shekhar Jain,
Abstract要約: 視覚言語モデル(VLM)は、堅牢な推論能力を必要とする真のインテリジェンスに向けた進歩を実証している。現在のVLMは、複雑なデータビジュアライゼーションを含むチャート質問回答(CQA)タスクにおいて、重大な制限に直面しています。本稿では、フィードバック駆動型ポリシー最適化によるチャート理解を強化する新しい強化学習フレームワークであるChart-RLを提案する。
参考スコア（独自算出の注目度）: 2.1902845922631435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data visualizations. Current VLMs face significant limitations in CQA, including imprecise numerical extraction, difficulty interpreting implicit visual relationships, and inadequate attention mechanisms for capturing spatial relationships in charts. In this work, we address these challenges by presenting Chart-RL, a novel reinforcement learning framework that enhances VLMs chart understanding through feedback-driven policy optimization of visual perception and logical inference. Our key innovation includes a comprehensive framework integrating Reinforcement Learning (RL) from Policy Optimization techniques along with adaptive reward functions, that demonstrates superior performance compared to baseline foundation models and competitive results against larger state-of-the-art architectures. We also integrated Parameter-Efficient Fine-Tuning through Low-Rank Adaptation (LoRA) in the RL framework that only requires single GPU configurations while preserving performance integrity. We conducted extensive benchmarking across open-source, proprietary, and state-of-the-art closed-source models utilizing the ChartQAPro dataset. The RL fine-tuned Qwen3-VL-4B-Instruct model achieved an answer accuracy of 0.634, surpassing the 0.580 accuracy of the Qwen3-VL-8B-Instruct foundation model despite utilizing half the parameter count, while simultaneously reducing inference latency from 31 seconds to 9 seconds.
Abstract（参考訳）: ビジョン言語モデル(VLM)の最近の進歩は、堅牢な推論能力を必要とする真のインテリジェンスへの進歩を示している。パターン認識以外にも、言語推論は視覚的理解と統合する必要がある。現在のVLMは、不正確な数値抽出、暗黙的な視覚的関係の解釈の難しさ、チャート内の空間的関係をキャプチャする不適切な注意機構など、CQAの重大な制限に直面している。本稿では,視覚的知覚と論理的推論のフィードバック駆動型ポリシ最適化を通じて,VLMのチャート理解を強化する新しい強化学習フレームワークであるChart-RLを提案することによって,これらの課題に対処する。我々の重要なイノベーションは、ポリシー最適化技術から強化学習(Reinforcement Learning, RL)を統合する包括的なフレームワークと、アダプティブ報酬関数(adaptive reward function)を統合し、ベースライン基盤モデルよりも優れたパフォーマンスを示し、より大規模な最先端アーキテクチャに対する競争結果を示すことです。また、パフォーマンスの整合性を維持しながら、単一のGPU構成のみを必要とするRLフレームワークに、ローランド適応(LoRA)を通じてパラメータ効率の良いファインチューニングを組み込んだ。 ChartQAProデータセットを用いて、オープンソース、プロプライエタリ、最先端のクローズドソースモデルにわたる広範なベンチマークを行った。 RL微細調整されたQwen3-VL-4B-インストラクタモデルは、パラメータ数の半分を生かしながら、Qwen3-VL-8B-インストラクタ基礎モデルの0.580精度を超え、同時に推論遅延を31秒から9秒に短縮した。

論文の概要: Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

関連論文リスト