Fugu-MT 論文翻訳(概要): Web Agents Should Adopt the Plan-Then-Execute Paradigm

論文の概要: Web Agents Should Adopt the Plan-Then-Execute Paradigm

arxiv url: http://arxiv.org/abs/2605.14290v1
Date: Thu, 14 May 2026 02:48:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.588666
Title: Web Agents Should Adopt the Plan-Then-Execute Paradigm
Title（参考訳）: WebエージェントはPlan-Then-Execute Paradigmを採用するべきだ
Authors: Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner,
Abstract要約: 我々は、WebエージェントがデフォルトでReActではなく plan-then-executeにすべきであると主張している。信頼できないデータは、事前に定義された実行グラフ内の値やブランチに影響を与える可能性がある。我々は、Web上でプラン-then-executeを採用する上での主要な障壁を特定します。
参考スコア（独自算出の注目度）: 9.920367562132336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reason is that web content mixes inputs from many parties. An e-commerce product page may combine a seller's listing, customer reviews and sponsored advertisements. Under ReAct, all of this content flows into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted data may influence values or branches inside a predefined execution graph, but it cannot redefine the user task or cause the model to synthesize new actions at runtime. We analyze WebArena, a popular web agent benchmark, and find that all tasks are compatible with plan-then-execute, while 80% can be completed with a purely programmatic plan, without any runtime LLM subroutine. We identify the main barrier to adopting plan-then-execute on the web: For it to work well, tools must map cleanly to semantic actions, with effects known before execution, so agents have enough information to plan. The web does not naturally expose that interface. Browser tools such as click, type, and scroll have page-dependent meanings. Planning at this layer is near-sighted: the agent can only see actions on the current page, and later actions appear only after it acts. Closing this gap requires typed interfaces that turn website interactions from clicks and keystrokes to task-level operations. This is an infrastructure problem, not a modeling problem. Web tasks do not need reactivity by default; they need typed, complete, auditable website APIs.
Abstract（参考訳）: ReActはLLMエージェントのデフォルトアーキテクチャとなり、多くの既存のWebエージェントがこのパラダイムに従っている。私たちは、Webエージェントのデフォルトが間違っていると論じています。代わりに、Webエージェントはデフォルトで plan-then-execute: 実行中のWebコンテンツを観察する前にタスク固有のプログラムにコミットし、実行します。理由は、Webコンテンツは多くの関係者からのインプットが混在しているからだ。 eコマース製品ページは、販売者のリスト、顧客レビュー、スポンサー付き広告を組み合わせることができる。 ReActでは、これらのコンテンツは次のアクションを決定するときにモデルに流れ込み、エージェントの制御フローを操縦するためにインジェクションをプロンプトするための直接パスを作成します。信頼されていないデータは、事前に定義された実行グラフ内の値やブランチに影響を与える可能性があるが、ユーザータスクを再定義したり、実行時にモデルに新しいアクションを合成させることはできない。我々は、人気のあるWebエージェントベンチマークであるWebArenaを分析し、すべてのタスクがプラン-then-executeと互換性があるのに対して、80%は、ランタイムのLLMサブルーチンなしで、純粋にプログラム的なプランで完了可能であることを発見した。うまく機能するためには、ツールはセマンティックアクションにきれいにマッピングされなければなりません。 Webはそのインターフェースを自然に公開しない。クリック、タイプ、スクロールなどのブラウザツールはページ依存の意味を持つ。エージェントは現在のページでのみアクションを見ることができ、後のアクションはそれが動作した後にのみ現れる。このギャップを埋めるには、Webサイトのインタラクションをクリックやキーストロークからタスクレベルの操作に変換する、型付きインターフェースが必要です。これはインフラストラクチャの問題であり、モデリングの問題ではありません。 Webタスクはデフォルトではリアクティビティを必要としない。

論文の概要: Web Agents Should Adopt the Plan-Then-Execute Paradigm

関連論文リスト