Fugu-MT 論文翻訳(概要): AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

論文の概要: AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

arxiv url: http://arxiv.org/abs/2605.25571v1
Date: Mon, 25 May 2026 08:26:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.47143
Title: AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution
Title（参考訳）: AnE: アンカー進化によるマルチモーダルLDMの推論フロンティアの推進
Authors: Zehao Wang, Yihan Zeng, Zidong Gong, Yuanfan Guo, Feng Zhu, Hongzhi Zhang, Wei Zhang, Wangmeng Zuo,
Abstract要約: Supervised Fine-Tuning (SFT) とReinforcement Learning (RL) による後学習は多モーダル大規模言語モデル(MLLM)における推論の強化に不可欠である既存のパラダイムは、静的データの制限により、しばしばパフォーマンスのボトルネックに達する。我々は,真理に順応したデータキュレーションとモデル進化を統合する新しいパラダイムであるアンカー進化(AnE)を提案する。
参考スコア（独自算出の注目度）: 61.593935260052795
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the limitations of static data. While current methods leverage self-reflection or self-evolution to push these boundaries, they still suffer from cognitive drift and hallucinated reasoning paths caused by low-quality synthetic data. To address these challenges, we propose Anchor Evolution (AnE), a new paradigm that integrates truth-anchored data curation and model evolution, achieving faithful and steady performance gains at the reasoning frontier. Specifically, we propose Truth Anchor Expansion, which pinpoints the model failing frontier via trajectory rollouts and leverages ground-truth databases to retrieve high-fidelity anchors for faithful data curation. Subsequently, we introduce the Scaffold-Stripping Mechanism to internalize reasoning capabilities. This mechanism first anchors reasoning paths via scaffold-augmented supervision to mitigate the learning complexity and distribution drift of direct SFT on raw data, then leverages RL to strip the scaffold template, thereby effectively transitioning the reasoning paths into intrinsic model capabilities. Experimental results on multimodal reasoning benchmarks show that our method substantially advances the model performance frontier, improving the base model by 10.3\% across eight multimodal benchmarks and achieving state-of-the-art results. The code will be made publicly available.
Abstract（参考訳）: Supervised Fine-Tuning (SFT)とReinforcement Learning (RL)によるポストトレーニングは、MLLM(Multimodal Large Language Models)における推論の強化に不可欠である。現在の方法は、これらの境界を押し上げるために自己回帰や自己進化を利用するが、低品質の合成データによって引き起こされる認知的ドリフトと幻覚的な推論パスに苦しむ。これらの課題に対処するために、真理に順応したデータキュレーションとモデル進化を統合し、推論フロンティアで忠実で安定したパフォーマンス向上を達成する新しいパラダイムであるアンカー進化(AnE)を提案する。具体的には,トラジェクティブ・ロールアウトによりモデルがフェールフロンティアをピンポイントし,地平データベースを利用して忠実なデータキュレーションのために高忠実なアンカーを検索するTrath Anchor Expansionを提案する。次に,推論機能の内部化のためのScaffold-Stripping機構を提案する。このメカニズムは、まず足場拡張監視による推論パスをアンカーし、生データ上で直接SFTの学習複雑性と分布ドリフトを緩和し、RLを利用して足場テンプレートを除去し、推論パスを本質的なモデル機能に効果的に移行する。マルチモーダル推論ベンチマークの実験結果から,本手法はモデル性能フロンティアを大幅に向上し,8つのマルチモーダルベンチマークでベースモデルを10.3\%改善し,最先端の結果を得た。コードは公開されます。

論文の概要: AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

関連論文リスト