Fugu-MT 論文翻訳(概要): MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

論文の概要: MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

arxiv url: http://arxiv.org/abs/2508.18669v1
Date: Tue, 26 Aug 2025 04:26:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.677807
Title: MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use
Title（参考訳）: MUA-RL:エージェントツール利用のためのマルチターンユーザインタラクションエージェント強化学習
Authors: Weikang Zhao, Xili Wang, Chengdi Ma, Lingbin Kong, Zhaohua Yang, Mingxiang Tuo, Xiaowei Shi, Yitao Zhai, Xunliang Cai,
Abstract要約: MUA-RL(Multi-turn User-interacting Agent Reinforcement Learning for agentic tools use)を新たに導入した。 MUA-RLはLLMを模擬したユーザを強化学習ループに統合する。 TAU2 Retailでは67.3、TAU2 Airlineでは45.4、TAU2 Telecomでは28.3、BFCL-V3 Multi Turnでは28.4、ACEBench Agentでは82.5である。
参考スコア（独自算出の注目度）: 13.2154672798075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the recent rapid advancement of Agentic Intelligence, agentic tool use in LLMs has become increasingly important. During multi-turn interactions between agents and users, the dynamic, uncertain, and stochastic nature of user demands poses significant challenges to the agent's tool invocation capabilities. Agents are no longer expected to simply call tools to deliver a result; rather, they must iteratively refine their understanding of user needs through communication while simultaneously invoking tools to resolve user queries. Existing reinforcement learning (RL) approaches for tool use lack the integration of genuinely dynamic users during the RL training process. To bridge this gap, we introduce MUA-RL (Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use), a novel reinforcement learning framework that, for the first time in the field of agentic tool use, integrates LLM-simulated users into the reinforcement learning loop. MUA-RL aims to enable autonomous learning of models to communicate with users efficiently and use various tools to solve practical problems in dynamic multi-turn interactions. Evaluations are done on several multi-turn tool-using benchmarks (see Figure 1). Specifically, MUA-RL-32B achieves 67.3 on TAU2 Retail, 45.4 on TAU2 Airline, 28.3 on TAU2 Telecom, 28.4 on BFCL-V3 Multi Turn, and 82.5 on ACEBench Agent -- outperforming or matching the performance of larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking settings.
Abstract（参考訳）: 近年,エージェント・インテリジェンス(エージェント・インテリジェンス)の急速な進歩に伴い,LSMにおけるエージェント・ツール・ユースの重要性が高まっている。エージェントとユーザ間のマルチターンインタラクションの間、ユーザ要求の動的で不確実で確率的な性質は、エージェントのツール呼び出し機能に重大な課題をもたらす。エージェントは、結果を提供するツールを単に呼び出すのではなく、コミュニケーションを通じてユーザニーズに対する理解を反復的に洗練し、同時にユーザクエリを解決するツールを呼び出す必要がある。ツール使用のための既存の強化学習(RL)アプローチは、RLトレーニングプロセス中に真にダイナミックなユーザの統合を欠いている。このギャップを埋めるために,エージェントツール利用におけるMUA-RL(Multi-turn User-interacting Agent Reinforcement Learning for agentic tools use)を導入する。 MUA-RLは、モデルの自律的な学習がユーザと効率的にコミュニケーションし、様々なツールを使って動的マルチターンインタラクションの実践的な問題を解決することを目的としている。評価は、複数のマルチターンツール使用ベンチマークで行われます(図1参照)。具体的には、MUA-RL-32BはTAU2リテールで67.3、TAU2エアラインで45.4、TAU2テレコムで28.3、BFCL-V3マルチターンで28.4、ACEBench Agentで82.5を達成している。

論文の概要: MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

関連論文リスト