Fugu-MT 論文翻訳(概要): AIR: Adaptive Interleaved Reasoning with Code in MLLMs

論文の概要: AIR: Adaptive Interleaved Reasoning with Code in MLLMs

arxiv url: http://arxiv.org/abs/2606.23678v1
Date: Mon, 22 Jun 2026 17:58:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 17:12:18.633449
Title: AIR: Adaptive Interleaved Reasoning with Code in MLLMs
Title（参考訳）: AIR: MLLMのコードによる適応的インターリーブ推論
Authors: Cong Han, Xiaohan Lan, Haibo Qiu, Yujie Zhong,
Abstract要約: マルチモーダル言語モデル(MLLM)を強化するためのコードとのインターリーブ推論は、重要な研究フロンティアとなっている。本稿では、コード強化複素数値タスクにおける強化学習訓練により、適応的インターリーブ推論機能を有するMLLMを増強する。実験により,グループ制約付き報酬関数を用いた強化学習の学習後,評価ベンチマークにおいて平均6.1ポイント(pp)の性能向上が示された。
参考スコア（独自算出の注目度）: 26.910280225921934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Following the paradigm shift initiated by OpenAI o3, interleaved reasoning with code to enhance multimodal large language models (MLLMs) has become a pivotal research frontier. The existing literature focuses primarily on tool-use within vision-perception tasks. However, such approaches typically rely on predefined heuristics for visual manipulation and are inherently incapable of addressing numerical computation problems due to their exclusive focus on visual operations. This paper empowers MLLMs with adaptive interleaved reasoning capabilities through extended reinforcement learning training on code-augmented complex numerical computation tasks. To this end, we propose a comprehensive three-component solution consisting of: a two-stage cold-start data construction pipeline, data filtering strategies for RL dataset curation, and an adaptive tool-invocation strategy leveraging a group-constrained reward function for interleaved reasoning trajectories. Extensive experiments demonstrate that after Reinforcement Learning training with the group-constrained reward function, performance improves by an average of 6.1 percentage points (pp) on evaluation benchmarks. Specifically, the accuracy for interleaved reasoning samples increases by 9.9 pp, and the overall success rate of tool-use exceeds 95%. Our data and code are available at: https://github.com/CongHan0808/AIR.git.
Abstract（参考訳）: OpenAI o3のパラダイムシフトに続いて、マルチモーダルな大規模言語モデル(MLLM)を強化するためのコードとのインターリーブによる推論が、重要な研究フロンティアとなっている。既存の文献は主に視覚知覚タスクにおけるツール使用に焦点を当てている。しかし、このようなアプローチは一般的に、視覚操作に事前定義されたヒューリスティックに依存しており、視覚操作にのみ焦点をあてたため、本質的に数値計算の問題に対処することができない。本稿では,コード拡張された複素数値計算タスクにおける強化学習トレーニングを通じて,適応的インターリーブ推論機能を備えたMLLMの強化を行う。そこで本研究では,2段階のコールドスタートデータ構築パイプライン,RLデータセットキュレーションのためのデータフィルタリング戦略,グループ制約付き報酬関数を利用した適応的ツール起動戦略,からなる総合的な3成分ソリューションを提案する。集団制約付き報酬関数を用いた強化学習訓練の結果,評価ベンチマークでは,平均6.1ポイント (pp) の性能が向上した。具体的には、インターリーブされた推論サンプルの精度は9.9ppで増加し、ツールユースの全体的な成功率は95%を超えている。私たちのデータとコードは、https://github.com/CongHan0808/AIR.git.comで利用可能です。

論文の概要: AIR: Adaptive Interleaved Reasoning with Code in MLLMs

関連論文リスト