Fugu-MT 論文翻訳(概要): RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

論文の概要: RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

arxiv url: http://arxiv.org/abs/2512.22560v1
Date: Sat, 27 Dec 2025 11:14:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-30 22:37:30.112796
Title: RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure
Title（参考訳）: RollArt: 分散インフラストラクチャによるエージェントRLトレーニングのスケーリング
Authors: Wei Gao, Yuheng Zhao, Tianyuan Wu, Shaopan Xiong, Weixun Wang, Dakai An, Lunxi Cao, Dilxat Muhtar, Zichen Liu, Haizhou Zhao, Ju Huang, Siran Yang, Yongbin Li, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang,
Abstract要約: エージェント強化学習(RL)は、大規模言語モデル(LLM)が自律的な意思決定と長期計画を行うことを可能にする。分散インフラストラクチャ上でマルチタスクエージェントRLのスループットを最大化する分散システムであるRollArcを提案する。
参考スコア（独自算出の注目度）: 49.88201789074532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environment simulations. We argue that efficient agentic RL training requires disaggregated infrastructure to leverage specialized, best-fit hardware. However, naive disaggregation introduces substantial synchronization overhead and resource underutilization due to the complex dependencies between stages. We present RollArc, a distributed system designed to maximize throughput for multi-task agentic RL on disaggregated infrastructure. RollArc is built on three core principles: (1) hardware-affinity workload mapping, which routes compute-bound and bandwidth-bound tasks to bestfit GPU devices, (2) fine-grained asynchrony, which manages execution at the trajectory level to mitigate resource bubbles, and (3) statefulness-aware computation, which offloads stateless components (e.g., reward models) to serverless infrastructure for elastic scaling. Our results demonstrate that RollArc effectively improves training throughput and achieves 1.35-2.05\(\times\) end-to-end training time reduction compared to monolithic and synchronous baselines. We also evaluate RollArc by training a hundreds-of-billions-parameter MoE model for Qoder product on an Alibaba cluster with more than 3,000 GPUs, further demonstrating RollArc scalability and robustness. The code is available at https://github.com/alibaba/ROLL.
Abstract（参考訳）: エージェント強化学習(RL)は、大規模言語モデル(LLM)が自律的な意思決定と長期計画を行うことを可能にする。通常のLLMポストトレーニングとは異なり、エージェントRLワークロードは非常に異種であり、計算集約型プリフィルフェーズ、帯域幅限定デコーディング、ステートフルなCPU重環境シミュレーションを組み合わせる。エージェントRLの効率的なトレーニングには、特殊なベストフィットハードウェアを活用するために、分散インフラストラクチャが必要である、と我々は主張する。しかし、単純な分解は、ステージ間の複雑な依存関係のため、かなりの同期オーバーヘッドとリソースの未利用をもたらす。分散インフラストラクチャ上でマルチタスクエージェントRLのスループットを最大化する分散システムであるRollArcを提案する。 RollArcは、(1)計算バウンドおよび帯域幅バウンドタスクをGPUデバイスに最適にルーティングするハードウェア・アフィニティ・ワークロードマッピング、(2)軌道レベルの実行を管理してリソースバブルを緩和する微粒化非同期、(3)ステートフルネス・アウェアな計算、(3)ステートレスコンポーネント(例えば報酬モデル)をエラスティックスケーリングのためにサーバーレスインフラストラクチャにオフロードする3つのコア原理に基づいて構築されている。この結果から,RollArcはトレーニングスループットを効果的に向上し,モノリシックおよび同期ベースラインと比較して1.35-2.05\(\times\)のエンドツーエンドトレーニング時間短縮を実現した。また、3000以上のGPUを持つAlibabaクラスタ上で、Qoder製品のための数百のパラメータMOEモデルをトレーニングし、RollArcのスケーラビリティと堅牢性を実証することで、RollArcを評価する。コードはhttps://github.com/alibaba/ROLLで公開されている。

論文の概要: RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

関連論文リスト