Fugu-MT 論文翻訳(概要): GTA: Generating Long-Horizon Tasks for Web Agents at Scale

論文の概要: GTA: Generating Long-Horizon Tasks for Web Agents at Scale

arxiv url: http://arxiv.org/abs/2605.29218v1
Date: Thu, 28 May 2026 01:05:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:55.576148
Title: GTA: Generating Long-Horizon Tasks for Web Agents at Scale
Title（参考訳）: GTA: 大規模Webエージェントのための長期タスクの生成
Authors: Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou, Muhao Chen, Jonathan May, Chien-Sheng Wu,
Abstract要約: 我々は、クローリング、検索ベースのシード、コンテキスト内生成、自動品質管理を統合したスケーラブルなフレームワーク、GTAを導入する。 eコマース、政府、フォーラム、ニュースをカバーする50以上のウェブサイトでパイプラインをインスタンス化し、マルチリンガルとマルチホップをカバーしています。 i) マルチホップWebエージェントタスク生成の形式化、(ii) 自動データ生成のための効率的で検証されたパイプラインの提案、(iii) 再現可能な評価を伴う動的ベンチマークのリリースである。
参考スコア（独自算出の注目度）: 82.43869456830664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is increasingly limited by the lack of scalable, process-level supervision. Existing benchmarks are largely manually constructed, providing only coarse start-goal annotations without intermediate trajectories, while recent automatic generation efforts remain expensive, biased, and shallow. These limitations prevent reliable training and evaluation of agents that must generalize to realistic, multi-hop, cross-page tasks. We introduce a scalable framework, GTA, that integrates crawling, retrieval-based seeding, in-context generation, and automated quality control to produce realistic tasks paired with executable trajectories. This design decouples crawling from generation for greater efficiency, grounds tasks in the site graph to enforce compositionality, and ensures dense supervision through deterministic replays and systematic validation. We instantiate the pipeline on over 50 websites covering e-commerce, government, forums, and news, with multilingual and multi-hop coverage. The resulting benchmark reveals a significant human-agent performance gap and enables detailed diagnostics. Our contributions are three-fold: (i) formalizing multi-hop web-agent task generation, (ii) proposing an efficient and validated pipeline for automatic data creation, and (iii) releasing a dynamic benchmark with reproducible evaluation.
Abstract（参考訳）: Webエージェントは、ブラウジングとツール使用機能を組み合わせた言語モデルで、オープンなWebアシスタントとして約束を示す。しかし、拡張性のあるプロセスレベルの監督が欠如しているため、進歩はますます限られています。既存のベンチマークは主に手作業で構築されており、中間軌道のない粗いスタートゴールアノテーションのみを提供する一方、最近の自動生成作業は高価で偏りがあり、浅いままである。これらの制限は、現実的でマルチホップなクロスページタスクに一般化する必要があるエージェントの信頼性の高いトレーニングと評価を妨げる。本稿では,クローリング,検索ベースのシード,コンテキスト内生成,自動品質制御を統合した拡張性フレームワークであるGTAを導入し,実行可能トラジェクトリと組み合わせた現実的なタスクを生成する。この設計は、より効率のよいクローリングの生成から切り離し、サイトグラフにタスクを置き、構成性を強制し、決定論的リプレイと体系的な検証を通じて密集した監督を保証する。 eコマース、政府、フォーラム、ニュースをカバーする50以上のウェブサイトでパイプラインをインスタンス化し、マルチリンガルとマルチホップをカバーしています。その結果得られたベンチマークでは、人間とエージェントのパフォーマンスのギャップが大きくなり、詳細な診断が可能になった。私たちの貢献は3倍です。 (i)マルチホップウェブエージェントタスク生成の形式化二自動データ作成のための効率的かつ検証されたパイプラインの提案、及び三再現可能な評価を伴う動的ベンチマークをリリースすること。

論文の概要: GTA: Generating Long-Horizon Tasks for Web Agents at Scale

関連論文リスト