Fugu-MT 論文翻訳(概要): WebXSkill: Skill Learning for Autonomous Web Agents

論文の概要: WebXSkill: Skill Learning for Autonomous Web Agents

arxiv url: http://arxiv.org/abs/2604.13318v1
Date: Tue, 14 Apr 2026 21:48:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.311879
Title: WebXSkill: Skill Learning for Autonomous Web Agents
Title（参考訳）: WebXSkill: 自律的なWebエージェントのためのスキル学習
Authors: Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao,
Abstract要約: WebXSkillは、コードベースのスキルと自然言語ガイダンスのギャップを埋めるフレームワークである。 WebArenaとWebVoyagerでは、WebXSkillはタスク成功率をベースラインで最大9.8と12.9ポイント改善する。
参考スコア（独自算出の注目度）: 104.76374637691212
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directly executed, while code-based skills are executable but opaque to the agent, offering no step-level understanding for error recovery or adaptation. We introduce WebXSkill, a framework that bridges this gap with executable skills, each pairing a parameterized action program with step-level natural language guidance, enabling both direct execution and agent-driven adaptation. WebXSkill operates in three stages: skill extraction mines reusable action subsequences from readily available synthetic agent trajectories and abstracts them into parameterized skills, skill organization indexes skills into a URL-based graph for context-aware retrieval, and skill deployment exposes two complementary modes, grounded mode for fully automated multi-step execution and guided mode where skills serve as step-by-step instructions that the agent follows with its native planning. On WebArena and WebVoyager, WebXSkill improves task success rate by up to 9.8 and 12.9 points over the baseline, respectively, demonstrating the effectiveness of executable skills for web agents. The code is publicly available at https://github.com/aiming-lab/WebXSkill.
Abstract（参考訳）: 大きな言語モデル(LLM)をベースとした自律的なWebエージェントは、複雑なブラウザタスクの完了を約束している。テキストワークフロースキルは自然言語のガイダンスを提供するが、直接実行することはできない。一方、コードベースのスキルは実行可能だがエージェントには不透明であり、エラー回復や適応のためのステップレベルの理解を提供しない。本稿では,このギャップを実行可能なスキルで埋めるフレームワークであるWebXSkillを紹介する。 WebXSkillは3つの段階で動作している: スキル抽出機は、容易に利用可能な合成エージェントのトラジェクトリから再利用可能なアクションサブシーケンスをパラメータ化されたスキルに抽象化し、スキル組織は、コンテキスト認識検索のためのURLベースのグラフにスキルをインデックスし、スキル展開は、2つの補完的なモード、完全に自動化されたマルチステップ実行のためのグラウンドドモードと、エージェントがネイティブプランで従うステップバイステップの指示として機能するガイドモードを公開する。 WebArenaとWebVoyagerでは、WebXSkillは、それぞれベースライン上の最大9.8ポイントと12.9ポイントのタスク成功率を改善し、Webエージェントの実行可能なスキルの有効性を実証している。コードはhttps://github.com/aiming-lab/WebXSkill.comで公開されている。

論文の概要: WebXSkill: Skill Learning for Autonomous Web Agents

関連論文リスト