Fugu-MT 論文翻訳(概要): Scaling Synthetic Task Generation for Agents via Exploration

論文の概要: Scaling Synthetic Task Generation for Agents via Exploration

arxiv url: http://arxiv.org/abs/2509.25047v1
Date: Mon, 29 Sep 2025 17:00:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:20.146071
Title: Scaling Synthetic Task Generation for Agents via Exploration
Title（参考訳）: 探索によるエージェントのスケーリングタスク生成
Authors: Ram Ramrakhya, Andrew Szot, Omar Attia, Yuhao Yang, Anh Nguyen, Bogdan Mazoure, Zhe Gan, Harsh Agrawal, Alexander Toshev,
Abstract要約: 対話型エージェントを構築するための訓練後のマルチモーダル大言語モデル(MLLM)は、コンピュータ利用、Webナビゲーション、ロボット工学といった分野にまたがる約束を守る。タスク生成のための既存のアプローチは、ダウンストリーム環境情報に制限のある人間のアノテーションやMLLMのプロンプトに大きく依存している。本稿では,タスク生成のためのスケーラブルなパイプラインであるAutoPlayについて紹介する。
参考スコア（独自算出の注目度）: 67.70129766322985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-Training Multimodal Large Language Models (MLLMs) to build interactive agents holds promise across domains such as computer-use, web navigation, and robotics. A key challenge in scaling such post-training is lack of high-quality downstream agentic task datasets with tasks that are diverse, feasible, and verifiable. Existing approaches for task generation rely heavily on human annotation or prompting MLLM with limited downstream environment information, which is either costly or poorly scalable as it yield tasks with limited coverage. To remedy this, we present AutoPlay, a scalable pipeline for task generation that explicitly explores interactive environments to discover possible interactions and current state information to synthesize environment-grounded tasks. AutoPlay operates in two stages: (i) an exploration phase, where an MLLM explorer agent systematically uncovers novel environment states and functionalities, and (ii) a task generation phase, where a task generator leverages exploration trajectories and a set of task guideline prompts as context to synthesize diverse, executable, and verifiable tasks. We show AutoPlay generates 20k tasks across 20 Android applications and 10k tasks across 13 applications Ubuntu applications to train mobile-use and computer-use agents. AutoPlay generated tasks enable large-scale task demonstration synthesis without human annotation by employing an MLLM task executor and verifier. This data enables training MLLM-based UI agents that improve success rates up to $20.0\%$ on mobile-use and $10.9\%$ on computer-use scenarios. In addition, AutoPlay generated tasks combined with MLLM verifier-based rewards enable scaling reinforcement learning training of UI agents, leading to an additional $5.7\%$ gain. coverage. These results establish AutoPlay as a scalable approach for post-training capable MLLM agents reducing reliance on human annotation.
Abstract（参考訳）: 対話型エージェントを構築するための訓練後のマルチモーダル大言語モデル(MLLM)は、コンピュータ利用、Webナビゲーション、ロボット工学といった分野にまたがる約束を守る。このようなポストトレーニングのスケールアップにおける重要な課題は、さまざまな、実現可能、検証可能なタスクを備えた高品質なダウンストリームエージェントタスクデータセットの欠如である。既存のタスク生成のアプローチは、人間のアノテーションや、限られた下流環境情報によるMLLMのプロンプトに大きく依存している。これを解決するために,タスク生成のためのスケーラブルなパイプラインであるAutoPlayを紹介した。対話型環境を明示的に探索し,可能なインタラクションや現状情報を検出し,環境下でのタスクを合成する。 AutoPlayは以下の2段階で動作する。一 MLLM探査員が新しい環境状態及び機能性を系統的に明らかにし、かつ、探査段階二タスク生成フェーズにおいて、タスク生成者が探索軌道を利用し、タスクガイドラインのセットをコンテキストとしてプロンプトし、多様な、実行可能な、検証可能なタスクを合成する。モバイルおよびコンピュータ使用エージェントをトレーニングするために、AutoPlayは、20のAndroidアプリケーションに20kタスク、13のアプリケーションに10kタスクを生成します。 AutoPlayの生成したタスクは、MLLMタスク実行子と検証子を使用することで、人間のアノテーションなしで大規模なタスクのデモ合成を可能にする。このデータにより、MLLMベースのUIエージェントをトレーニングし、モバイル使用で最大20.0$%、コンピュータ使用シナリオで10.9$%の成功率を改善することができる。さらに、AutoPlayが生成したタスクとMLLM検証ベースの報酬を組み合わせることで、UIエージェントの強化学習トレーニングのスケーリングが可能になる。報道だこれらの結果から,オートプレイは,人間のアノテーションへの依存を軽減し,後処理能力を持つMLLMエージェントのスケーラブルなアプローチとして確立された。

論文の概要: Scaling Synthetic Task Generation for Agents via Exploration

関連論文リスト