Fugu-MT 論文翻訳(概要): Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

論文の概要: Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

arxiv url: http://arxiv.org/abs/2604.03179v1
Date: Fri, 03 Apr 2026 16:56:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.540781
Title: Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
Title（参考訳）: マルチモーダル推論モデルの強化後訓練における幻覚の役割の理解
Authors: Gengwei Zhang, Jie Peng, Zhen Tan, Mufan Qiu, Hossein Nourkhiz Mahjoub, Vaishnav Tadiparthi, Kwonjoon Lee, Yanyong Zhang, Tianlong Chen,
Abstract要約: 強化学習(RL)は、視覚的推論能力を高めるために、MLLM(Multimodal Large Language Models)の訓練後におけるRLの採用の増加にインスピレーションを与えている。本稿では,Halucination-as-Cueフレームワークを提案し,モデル幻覚の観点からのマルチモーダル推論モデルに対するRLベースのポストトレーニングの効果について検討する。
参考スコア（独自算出の注目度）: 54.50728814348712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent success of reinforcement learning (RL) in large reasoning models has inspired the growing adoption of RL for post-training Multimodal Large Language Models (MLLMs) to enhance their visual reasoning capabilities. Although many studies have reported improved performance, it remains unclear whether RL training truly enables models to learn from visual information. In this work, we propose the Hallucination-as-Cue Framework, an analytical framework designed to investigate the effects of RL-based post-training on multimodal reasoning models from the perspective of model hallucination. Specifically, we introduce hallucination-inductive, modality-specific corruptions that remove or replace essential information required to derive correct answers, thereby forcing the model to reason by hallucination. By applying these corruptions during both training and evaluation, our framework provides a unique perspective for diagnosing RL training dynamics and understanding the intrinsic properties of datasets. Through extensive experiments and analyses across multiple multimodal reasoning benchmarks, we reveal that the role of model hallucination for RL-training is more significant than previously recognized. For instance, we find that RL post-training under purely hallucination-inductive settings can still significantly improve models' reasoning performance, and in some cases even outperform standard training. These findings challenge prevailing assumptions about MLLM reasoning training and motivate the development of more modality-aware RL-based training designs.
Abstract（参考訳）: 大規模推論モデルにおける強化学習(RL)の成功は、視覚的推論能力を高めるために、後学習型マルチモーダル言語モデル(MLLM)におけるRLの採用の増加に拍車をかけた。多くの研究で性能の改善が報告されているが、RLトレーニングが実際に視覚情報から学習できるかどうかは不明だ。本稿では, モデル幻覚の観点からのマルチモーダル推論モデルに対するRLベースのポストトレーニングの効果を調べるための分析フレームワークであるHalucination-as-Cue Frameworkを提案する。具体的には、正解を導き出すために必要な必須情報を取り除いたり、置き換えたりする、幻覚誘発的、モダリティ特異的な汚職を導入し、そのモデルに幻覚による推論を強制する。トレーニングと評価の両方にこれらの汚職を適用することで、我々のフレームワークは、RLトレーニングのダイナミクスを診断し、データセットの固有の性質を理解するためのユニークな視点を提供する。複数のマルチモーダル推論ベンチマークの広範な実験と解析を通して、RL学習におけるモデル幻覚の役割が、これまで認識されていたよりも重要であることを明らかにした。例えば、純粋に幻覚誘発的な設定下でのRLポストトレーニングは、モデルの推論性能を著しく向上させることができ、場合によっては標準トレーニングよりも優れています。これらの知見はMLLM推論トレーニングの仮定を克服し、よりモダリティを意識したRLベースのトレーニング設計の開発を動機付けている。

論文の概要: Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

関連論文リスト