Fugu-MT 論文翻訳(概要): IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

論文の概要: IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

arxiv url: http://arxiv.org/abs/2606.11652v1
Date: Wed, 10 Jun 2026 04:30:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.294556
Title: IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
Title（参考訳）: IAPO:小型マルチモーダルエージェントにおけるツール利用のための入力属性対応ポリシー最適化
Authors: Yifan Yang, Zhen Zhang, Jiayi Tian, Liyan Tan, Zheng Zhang,
Abstract要約: 本稿では,小言語モデル(SLM)におけるツールコール能力向上のための強化学習法について検討する。入力成分間の帰属関係をより強力な教師の帰属関係と整合させることにより,マルチモーダルSLMにおけるツール利用を改善するためのRLアルゴリズムであるIAPOを提案する。 Qwen2.5-VL-3B実験の結果,提案手法は既存の視覚ツールの使用状況と比較して,6つのテストセットの平均3%の視覚的質問応答精度を向上させる。
参考スコア（独自算出の注目度）: 12.019312046941396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates reinforcement learning (RL) methods for improving tool-calling capabilities in multimodal small language model (SLM) agents. While existing works have explored various reward designs to improve agentic tool-calling ability, these approaches face inherent limitations for SLM training, especially under multimodal scenarios. First, many existing methods evaluate tool use correctness through exact matching against certain ground-truth or predefined formats. However, this assumption is often unsuitable for multimodal tasks, where multiple tool use paths may be valid and annotated tool trajectories are typically unavailable. Second, such sparse and brittle binary rewards provide little guidance on how to improve the underlying decision process, making them particularly difficult for multimodal SLM to learn from. To address these issues, we propose Input Attribution-Aware Policy Optimization (IAPO), an RL algorithm for improving tool use in multimodal SLM by aligning the model's attribution across input components with that of a stronger teacher. Experiments on Qwen2.5-VL-3B show that the proposed method improves visual question answering accuracy by an average of 3% across six test sets compared with existing visual tool use work, by helping the model attend to the most relevant input evidence.
Abstract（参考訳）: 本稿では,マルチモーダル小言語モデル(SLM)エージェントにおけるツールコール能力向上のための強化学習手法について検討する。既存の研究では、エージェントツール呼び出し能力を改善するための様々な報酬設計が検討されているが、これらのアプローチは、特にマルチモーダルシナリオにおいて、SLMトレーニングに固有の制限に直面している。まず、既存の多くの手法は、特定の基礎構造や事前定義されたフォーマットとの正確なマッチングを通じて、ツール使用の正確性を評価する。しかし、この仮定はマルチモーダルタスクには適さないことが多く、複数のツールの使用経路が有効であり、注釈付きツールの軌跡は通常利用できない。第二に、このような疎結合で不安定なバイナリ報酬は、根底にある意思決定プロセスを改善するためのガイダンスをほとんど提供しないため、マルチモーダルSLMでは特に学習が困難である。これらの問題に対処するために,入力属性認識ポリシー最適化 (IAPO) を提案する。このアルゴリズムは,入力コンポーネント間の帰属関係を,より強い教師の帰属関係と整合させることにより,マルチモーダルSLMにおけるツール使用率を改善するためのRLアルゴリズムである。 Qwen2.5-VL-3B実験の結果,提案手法は既存の視覚ツールの使用状況と比較して,6つのテストセットの平均3%の視覚的質問応答精度を向上させる。

論文の概要: IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

関連論文リスト