Fugu-MT 論文翻訳(概要): Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

論文の概要: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

arxiv url: http://arxiv.org/abs/2509.23866v1
Date: Sun, 28 Sep 2025 13:19:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.493716
Title: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Title（参考訳）: 分離学習と適応データキュレーションによるGUIエージェントの効率的なマルチターンRL
Authors: Pengxiang Li, Zechen Hu, Zirui Shang, Jingrong Wu, Yang Liu, Hui Liu, Zhi Gao, Chenrui Shi, Bofei Zhang, Zihao Zhang, Xiaochuan Shi, Zedong YU, Yuwei Wu, Xinxiao Wu, Yunde Jia, Liuyu Xiang, Zhaofeng He, Qing Li,
Abstract要約: 視覚言語モデル(VLM)に基づくGUIエージェントは複雑なタスクの自動化を約束するが、強化学習(RL)の適用において大きな課題に直面している。異種モジュールを高度に非結合的に協調するGUIエージェントのための非結合エージェントRLトレーニングフレームワークであるDARTを提案する。 OSWorldのベンチマークでは、DART-GUI-7Bは42.13%のタスク成功率、14.61%の絶対ゲイン、オープンソースSOTAよりも7.34%高い。
参考スコア（独自算出の注目度）: 65.3648667980258
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision-language model (VLM) based GUI agents show promise for automating complex desktop and mobile tasks, but face significant challenges in applying reinforcement learning (RL): (1) slow multi-turn interactions with GUI environments for policy rollout, and (2) insufficient high-quality agent-environment interactions for policy learning. To address these challenges, we propose DART, a Decoupled Agentic RL Training framework for GUI agents, which coordinates heterogeneous modules in a highly decoupled manner. DART separates the training system into four asynchronous modules: environment cluster, rollout service, data manager, and trainer. This design enables non-blocking communication, asynchronous training, rollout-wise trajectory sampling, and per-worker model synchronization, significantly improving the system efficiency: 1.6*GPU utilization for rollout, 1.9* training throughput, and 5.5* environment utilization. To facilitate effective learning from abundant samples, we introduce an adaptive data curation scheme: (1) pre-collecting successful trajectories for challenging tasks to supplement sparse success in online sampling; (2) dynamically adjusting rollout numbers and trajectory lengths based on task difficulty; (3) training selectively on high-entropy steps to prioritize critical decisions; (4) stabilizing learning via truncated importance sampling for policy mismatch between policy rollout and updating. On the OSWorld benchmark, DART-GUI-7B achieves a 42.13% task success rate, a 14.61% absolute gain over the base model, and 7.34% higher than open-source SOTA. We will fully open-source our training framework, data, and model checkpoints via computer-use-agents.github.io/dart-gui, which we believe is a timely contribution to the open-source community of agentic RL training.
Abstract（参考訳）: 視覚言語モデル(VLM)に基づくGUIエージェントは,複雑なデスクトップタスクとモバイルタスクの自動化を約束するが,強化学習(RL)の適用において大きな課題に直面している。これらの課題に対処するため,GUIエージェントのためのDART(Decoupled Agentic RL Training framework)を提案する。 DARTはトレーニングシステムを、環境クラスタ、ロールアウトサービス、データマネージャ、トレーナーの4つの非同期モジュールに分離する。この設計により、ノンブロッキング通信、非同期トレーニング、ロールアウトワイドトラジェクトリサンプリング、および作業者ごとのモデル同期が可能になり、システム効率が大幅に向上する。多様なサンプルから効果的な学習を容易にするために,(1)オンラインサンプリングにおけるスパース成功を補うために,成功軌道の事前収集,(2)タスクの難易度に基づいてロールアウト数と軌道長を動的に調整すること,(3)重要な決定を優先するための高エントロピーなステップを選択的に学習すること,(4)ポリシーのロールアウトと更新の間の政策ミスマッチのための重要度サンプリングによる学習を安定化すること,の順応的なデータキュレーション手法を導入する。 OSWorldのベンチマークでは、DART-GUI-7Bは42.13%のタスク成功率、14.61%の絶対ゲイン、オープンソースSOTAよりも7.34%高い。我々は、エージェントRLトレーニングのオープンソースコミュニティへのタイムリーな貢献である、コンピュータ利用エージェント.github.io/dart-guiを通じて、トレーニングフレームワーク、データ、モデルチェックポイントを完全にオープンソースにします。

論文の概要: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

関連論文リスト