Fugu-MT 論文翻訳(概要): EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

論文の概要: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

arxiv url: http://arxiv.org/abs/2605.18703v1
Date: Mon, 18 May 2026 17:37:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:50.208321
Title: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
Title（参考訳）: EnvFactory:実行可能環境合成とロバストRLによるツール利用エージェントのスケーリング
Authors: Minrui Xu, Zilin Wang, Mengyi DENG, Zhiwei Li, Zhicheng Yang, Xiao Zhu, Yinhong Liu, Boyu Zhu, Baiyu Huang, Chao Chen, Heyuan Deng, Fei Mi, Lifeng Shang, Xingshan Zeng, Zhijiang Guo,
Abstract要約: 本稿では,エージェント強化学習(Agentic RL)トレーニングのための完全に自動化されたフレームワークであるEnvFactoryを紹介する。 EnvFactoryは、認証リソースから自律的に、ステートフルで実行可能なツール環境を探索する。トポロジーを意識したサンプリングとキャリブレーションによる自然なマルチターン軌道を合成する。トレーニング効率とダウンストリーム性能が向上し、BFCLv3ではQwen3シリーズモデルを最大15%改善し、MPP-Atlasでは+8.6%、VitaBenchでは+6%向上した。
参考スコア（独自算出の注目度）: 54.09410318521061
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including $τ^2$-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.
Abstract（参考訳）: Agentic Reinforcement Learning (Agentic RL)を通じたツール使用機能を備えたLLMの取得は、スケーラブルで堅牢な実行環境の欠如と、暗黙の人間の推論をキャプチャする現実的なトレーニングデータの不足という2つの課題によってボトルネックになっている。既存のアプローチは、高価な現実世界のAPI、幻覚を引き起こすLCMシミュレータ、あるいはシングルターンまたはプリコンパイルされたドキュメントに依存する合成環境に依存している。さらに、合成軌道はしばしば過剰に特定され、自然の人間の意図よりも命令配列に類似しており、RL訓練の有効性を低下させる。両課題に対処する完全に自動化されたフレームワークであるEnvFactoryを紹介します。 EnvFactoryは、真正なリソースからステートフルで実行可能なツール環境を自律的に探索し検証し、トポロジを意識したサンプリングと校正による自然なマルチターントラジェクトリを合成し、暗黙の意図を持ったグラウンドドクエリを生成する。 EnvFactoryは7つのドメインにまたがる85の検証環境を使用して、2,575 SFTとRL軌道を生成する。 EnvFactoryはトレーニング効率とダウンストリームのパフォーマンスが向上し、BFCLv3ではQwen3シリーズモデルが最大15%改善され、MPP-Atlasでは+8.6%、τ^2$-BenchやVitaBenchなど会話ベンチマークでは+6%向上した。 EnvFactoryは環境構築と軌道合成の両方を完全に自動化することで、Agentic RLのスケーラブルで拡張性があり、堅牢な基盤を提供する。

論文の概要: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

関連論文リスト