Fugu-MT 論文翻訳(概要): OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

論文の概要: OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

arxiv url: http://arxiv.org/abs/2605.28805v1
Date: Wed, 27 May 2026 17:56:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.261913
Title: OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration
Title（参考訳）: OmniVerifier-M1: 明示的構造化校正によるマルチモーダルメタ検証
Authors: Xinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi, Yizhen Zhang, Junhong Liu, Youliang Zhang, Zhiheng Li, Yujiu Yang, Ling Yang,
Abstract要約: 決定のみの信号よりも検証器生成の合理性を利用するマルチモーダルなメタ検証について検討する。我々は,記号的メタ検証とデカップリングされた強化学習を利用した一般の視覚的検証であるOmniVerifier-M1を訓練する。このアプローチは、より信頼性が高く、解釈可能で、きめ細かいマルチモーダル検証の道を開き、より安全で、より制御可能な基盤モデルのデプロイメントをサポートする。
参考スコア（独自算出の注目度）: 48.11927189422178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decision-only signals, and explore how to effectively incorporate meta-verification feedback into multimodal verifier training. We identify two key findings. First, symbolic verifier outputs (e.g., bounding boxes) outperform textual explanations as meta-verification rationales, enabling efficient rule-based reinforcement learning rewards while avoiding reliance on model-based rewards from auxiliary judge models. Second, decoupling reinforcement learning objectives for binary judgment and meta-verification substantially outperforms joint reward optimization, due to intrinsic differences in output structure and learning dynamics. Based on these insights, we train OmniVerifier-M1, a generalist visual verifier leveraging symbolic meta-verification and decoupled reinforcement learning. OmniVerifier-M1 provides robust verification and fine-grained error localization, and further enables M1-TTS, a verifier-driven agentic generation system achieving dynamic region-level self-correction. This approach paves the way for more reliable, interpretable, and fine-grained multimodal verification, supporting safer and more controllable foundation model deployment.
Abstract（参考訳）: 視覚的な成果は、多モーダルな大規模言語モデルにおいてますます中心となってきており、ジェネラリスト基礎モデルのスケーリングに不可欠な信頼性ときめ細かい検証を実現している。本研究では,決定のみの信号よりも検証結果の有理性を活用するマルチモーダルなメタ検証について検討し,メタ検証フィードバックを効果的にマルチモーダルな検証学習に組み込む方法について検討する。主な発見は2つある。まず、記号検証器は、メタ検証論理としてテキスト説明を上回り、補助判断モデルからのモデルベース報酬への依存を回避しつつ、効率的なルールベースの強化学習報酬を可能にする。第2に、二分判定とメタ検証のための強化学習目標の分離は、出力構造と学習力学の本質的な違いにより、共同報酬最適化を著しく上回る。これらの知見に基づいて,記号的メタ検証とデカップリング強化学習を活用した汎用的視覚的検証器であるOmniVerifier-M1を訓練する。 OmniVerifier-M1は、堅牢な検証ときめ細かいエラーローカライゼーションを提供し、さらに、動的領域レベルの自己補正を実現する検証器駆動のエージェント生成システムであるM1-TTSを可能にする。このアプローチは、より信頼性が高く、解釈可能で、きめ細かいマルチモーダル検証の道を開き、より安全で、より制御可能な基盤モデルのデプロイメントをサポートする。

論文の概要: OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

関連論文リスト