Fugu-MT 論文翻訳(概要): MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)

論文の概要: MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)

arxiv url: http://arxiv.org/abs/2511.23281v1
Date: Fri, 28 Nov 2025 15:32:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.965268
Title: MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)
Title（参考訳）: MCP対RAG対NLWeb対HTML:異なるエージェントインタフェースの有効性と効果の比較(技術報告)
Authors: Aaron Steiner, Ralph Peeters, Christian Bizer,
Abstract要約: 我々は,HTML,MPP,NLWebの4つのe-shopsを模擬したテストベッドを紹介した。各インターフェース (HTML, RAG, MCP, NLWeb) に対して,同じタスクセットを実行する特殊なエージェントを開発する。 GPT 4.1, GPT 5, GPT 5 mini, Claude Sonnet 4 を基礎となる LLM として評価した。
参考スコア（独自算出の注目度）: 3.1427994341585688
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model agents are increasingly used to automate web tasks such as product search, offer comparison, and checkout. Current research explores different interfaces through which these agents interact with websites, including traditional HTML browsing, retrieval-augmented generation (RAG) over pre-crawled content, communication via Web APIs using the Model Context Protocol (MCP), and natural-language querying through the NLWeb interface. However, no prior work has compared these four architectures within a single controlled environment using identical tasks. To address this gap, we introduce a testbed consisting of four simulated e-shops, each offering its products via HTML, MCP, and NLWeb interfaces. For each interface (HTML, RAG, MCP, and NLWeb) we develop specialized agents that perform the same sets of tasks, ranging from simple product searches and price comparisons to complex queries for complementary or substitute products and checkout processes. We evaluate the agents using GPT 4.1, GPT 5, GPT 5 mini, and Claude Sonnet 4 as underlying LLM. Our evaluation shows that the RAG, MCP and NLWeb agents outperform HTML on both effectiveness and efficiency. Averaged over all tasks, F1 rises from 0.67 for HTML to between 0.75 and 0.77 for the other agents. Token usage falls from about 241k for HTML to between 47k and 140k per task. The runtime per task drops from 291 seconds to between 50 and 62 seconds. The best overall configuration is RAG with GPT 5 achieving an F1 score of 0.87 and a completion rate of 0.79. Also taking cost into consideration, RAG with GPT 5 mini offers a good compromise between API usage fees and performance. Our experiments show the choice of the interaction interface has a substantial impact on both the effectiveness and efficiency of LLM-based web agents.
Abstract（参考訳）: 大規模な言語モデルエージェントは、製品検索、比較提供、チェックアウトなどのWebタスクを自動化するために、ますます使われています。現在、これらのエージェントがWebサイトと対話するさまざまなインターフェースについて検討している。例えば、従来のHTMLブラウジング、事前クロールされたコンテンツに対する検索拡張生成(RAG)、モデルコンテキストプロトコル(MCP)を用いたWeb API経由の通信、NLWebインターフェースによる自然言語クエリなどがある。しかしながら、これらの4つのアーキテクチャを同一のタスクを使用して単一の制御環境で比較する以前の研究はない。このギャップに対処するため、我々は4つの模擬e-shopsからなるテストベッドを導入し、それぞれがHTML、MPP、NLWebインターフェースを介して製品を提供する。それぞれのインターフェース (HTML, RAG, MCP, NLWeb) に対して, 単純な製品検索や価格比較から, 補完的な製品や代替品の複雑なクエリ, チェックアウトプロセスまで, 同じタスクセットを実行する特殊なエージェントを開発する。 GPT 4.1, GPT 5, GPT 5 mini, Claude Sonnet 4 を基礎となる LLM として評価した。評価の結果, RAG, MCP, NLWeb エージェントは, 有効性と効率において HTML よりも優れていた。平均して、F1はHTMLの0.67から他のエージェントの0.75から0.77まで上昇する。トークンの使用量は、HTMLの約241kからタスク毎の47kから140kに減少する。タスク毎のランタイムは291秒から50から62秒の間になる。最も優れた構成はRAGで、GPT 5はF1スコア0.87、完成率は0.79である。また、GAT 5 miniによるRAGは、API使用料とパフォーマンスの良好な妥協を提供する。実験の結果, インタラクションインタフェースの選択は, LLM ベースの Web エージェントの有効性と効率に大きく影響していることがわかった。

論文の概要: MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)

関連論文リスト