Fugu-MT 論文翻訳(概要): Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

論文の概要: Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

arxiv url: http://arxiv.org/abs/2511.14460v1
Date: Tue, 18 Nov 2025 13:03:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.122375
Title: Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Title（参考訳）: Agent-R1:End-to-End強化学習による強力なLDMエージェントの訓練
Authors: Mingyue Cheng, Jie Ouyang, Shuo Yu, Ruiran Yan, Yucong Luo, Zirui Liu, Daoyu Wang, Qi Liu, Enhong Chen,
Abstract要約: 大規模言語モデル(LLM)は、複雑な問題を解決するために、アクティブな環境相互作用(例えばツールの使用)が可能なエージェントを構築するために、ますます研究されている。本稿では,まず,マルコフ決定プロセス(MDP)フレームワークを体系的に拡張することにより,LLMエージェントの強化学習方法論を再検討し,解明する。次に,RL ベースの LLM Agent のためのモジュール型でフレキシブルでユーザフレンドリなトレーニングフレームワークである Agent-R1 を紹介する。
参考スコア（独自算出の注目度）: 45.88626187315028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning (RL) is considered a key technology with significant potential for training such Agents; however, the effective application of RL to LLM Agents is still in its nascent stages and faces considerable challenges. Currently, this emerging field lacks in-depth exploration into RL approaches specifically tailored for the LLM Agent context, alongside a scarcity of flexible and easily extensible training frameworks designed for this purpose. To help advance this area, this paper first revisits and clarifies Reinforcement Learning methodologies for LLM Agents by systematically extending the Markov Decision Process (MDP) framework to comprehensively define the key components of an LLM Agent. Secondly, we introduce Agent-R1, a modular, flexible, and user-friendly training framework for RL-based LLM Agents, designed for straightforward adaptation across diverse task scenarios and interactive environments. We conducted experiments on Multihop QA benchmark tasks, providing initial validation for the effectiveness of our proposed methods and framework.
Abstract（参考訳）: 大規模言語モデル(LLM)は、複雑な問題を解決するために、アクティブな環境相互作用(例えばツールの使用)が可能なエージェントを構築するために、ますます研究されている。強化学習(Reinforcement Learning, RL)は、そのようなエージェントを訓練するための重要な可能性を持つ重要な技術と考えられているが、LLMエージェントにRLを効果的に適用することは、まだ初期段階にあり、かなりの課題に直面している。現在、この新興分野には、LLM Agentコンテキストに特化してRLアプローチの詳細な調査が欠如しており、この目的のために設計された柔軟で容易に拡張可能なトレーニングフレームワークが不足している。本稿では,まず,MDPフレームワークを体系的に拡張し,LLMエージェントの重要なコンポーネントを包括的に定義することにより,LLMエージェントの強化学習方法論を再検討し,解明する。次に,RL ベースの LLM Agent のためのモジュール型でフレキシブルでユーザフレンドリなトレーニングフレームワークである Agent-R1 を紹介する。我々は,マルチホップQAベンチマークタスクの実験を行い,提案手法とフレームワークの有効性を検証した。

論文の概要: Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

関連論文リスト