Fugu-MT 論文翻訳(概要): Scaling Agent Learning via Experience Synthesis

論文の概要: Scaling Agent Learning via Experience Synthesis

arxiv url: http://arxiv.org/abs/2511.03773v1
Date: Wed, 05 Nov 2025 18:58:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-07 20:17:53.179747
Title: Scaling Agent Learning via Experience Synthesis
Title（参考訳）: 経験的合成によるスケーリングエージェント学習
Authors: Zhaorun Chen, Zhuokai Zhao, Kai Zhang, Bo Liu, Qi Qi, Yifan Wu, Tarun Kalluri, Sara Cao, Yuanhao Xiong, Haibo Tong, Huaxiu Yao, Hengduo Li, Jiacheng Zhu, Xian Li, Dawn Song, Bo Li, Jason Weston, Dat Huynh,
Abstract要約: 強化学習(RL)は、対話を通じて自己改善を行うことで、大規模言語モデル(LLM)エージェントを強化することができる。私たちはDreamGymを紹介します。DreamGymはスケーラビリティを念頭において多様なエクスペリエンスを合成するために設計された最初の統合フレームワークです。高価な実環境のロールアウトに頼るのではなく、DreamGymは環境のダイナミクスを推論ベースのエクスペリエンスモデルに蒸留する。
参考スコア（独自算出の注目度）: 100.42712232390532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reinforcement learning (RL) can empower large language model (LLM) agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.
Abstract（参考訳）: 強化学習(RL)は、対話による自己改善を可能にすることで、大規模言語モデル(LLM)エージェントを強化できるが、コストのかかるロールアウト、タスクの多様性の制限、信頼性の低い報酬信号、インフラストラクチャの複雑さなどにより、その実践的採用は依然として困難である。これらの課題に対処するために、DreamGymを紹介します。DreamGymは、自律エージェントのためのオンラインRLトレーニングを効果的に行えるように、スケーラビリティを念頭に置いて多様なエクスペリエンスを合成するように設計された最初の統合フレームワークです。高価な実環境のロールアウトに頼るのではなく、DreamGymは環境のダイナミクスを推論ベースのエクスペリエンスモデルに精算し、ステップバイステップの推論を通じて一貫した状態遷移とフィードバックシグナルを導出し、RLのためのスケーラブルなエージェントロールアウトコレクションを可能にする。トランジションの安定性と品質を改善するために、DreamGymはオフラインの現実世界データで初期化され、新鮮なインタラクションで継続的にリッチ化され、エージェントトレーニングを積極的にサポートするエクスペリエンス再生バッファを利用する。知識獲得を改善するために、DreamGymは、現在のエージェントポリシーに挑戦する新しいタスクを適応的に生成し、より効果的なオンラインカリキュラム学習を可能にする。多様な環境とエージェントのバックボーンにわたる実験により、DreamGymは完全な総合的な設定とシミュレート・トゥ・リアルな転送シナリオの両方において、RLトレーニングを大幅に改善することが示された。 WebArenaのようなRL対応でないタスクでは、DreamGymはすべてのベースラインを30%以上上回っている。そして、RL対応だがコストがかかる設定では、GRPOとPPOのパフォーマンスとを合成相互作用のみで一致させる。合成経験に基づいて純粋に訓練されたポリシーを実環境RLに転送すると、DreamGymはより少ない現実世界の相互作用を必要とする一方で、大幅なパフォーマンス向上をもたらし、汎用RLのためのスケーラブルなウォームスタート戦略を提供する。

論文の概要: Scaling Agent Learning via Experience Synthesis

関連論文リスト