Fugu-MT 論文翻訳(概要): One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

論文の概要: One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

arxiv url: http://arxiv.org/abs/2606.13610v1
Date: Thu, 11 Jun 2026 17:24:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.950561
Title: One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders
Title（参考訳）: 汚染されたページは、ジェネレーティブなレコメンデーションにおけるWebコンテンツ汚染を評価する
Authors: Minghao Luo, Liang Chen,
Abstract要約: 本稿では,Web コンテンツ汚染下での偽商品のプロモーション評価のベンチマークである FORGE を紹介する。 ForGEは15のカテゴリと5つのコンシューマーシナリオで225の現実世界製品をカバーする。脆弱性はカテゴリーによって大きく異なり、モデルが関連する製品の事前知識を安定に欠いているときに増加する。
参考スコア（独自算出の注目度）: 5.185584621338163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Search-augmented LLMs increasingly mediate everyday consumer recommendations by retrieving live web content. This creates a new risk: generative recommenders may consume polluted web content, such as fake reviews and promotional pages crafted to mislead recommendations. We ask: to what extent do search-augmented LLMs become unwitting promoters of fake products when consuming polluted retrieval results? To answer this, we introduce FORGE (Fake Online Recommendations in Generative Environments), a benchmark for measuring fake-product promotion under controlled web-content pollution. Given an upstream search result, FORGE locally rewrites real products in retrieved web pages into fake ones to simulate web-content pollution, and measures how often the LLM recommends the fake product. FORGE covers 225 real-world products across 15 categories and 5 consumer scenarios. Across 12 commercial and open-weights LLMs, all models are vulnerable: a single polluted page yields fooled rates of up to 27%, while the full top-3 replacement raises this to 73.8%. Vulnerability varies substantially across categories, increasing when models lack stable prior knowledge of the relevant products. Reasoning does not mitigate this vulnerability; instead, it often generates spurious social proof to justify false recommendations. We evaluate three defenses: skepticism prompting and consensus filtering (over model priors or cross-document evidence). Skepticism can exacerbate vulnerability, much like reasoning, while filtering risks suppressing legitimate products. We release FORGE at https://github.com/leoluolol/forge-benchmark.
Abstract（参考訳）: 検索強化されたLLMは、ライブのWebコンテンツを検索することで、日々の消費者レコメンデーションを仲介する。ジェネレーティブなレコメンデーションは、偽レビューや、誤ったレコメンデーションのために作られたプロモーションページなど、汚染されたWebコンテンツを消費する可能性がある。汚染された検索結果を消費すると、LLMはどの程度偽商品のプロモーターになりうるのか? これに対応するために、Webコンテンツ汚染制御下での偽商品の促進を評価するためのベンチマークであるFOGE(Fake Online Recommendations in Generative Environments)を紹介する。上流の検索結果から、FOGEは検索したWebページの実際の製品を偽のものに書き換えて、Webコンテンツ汚染をシミュレートし、LLMが偽商品を推奨する頻度を測定する。 FORGEは15のカテゴリと5つのコンシューマーシナリオで225の現実世界製品をカバーする。 12の商業用およびオープンウェイト LLM にまたがって、全てのモデルに脆弱性がある: 1つの汚染されたページは27%の不正率で、完全なトップ3置換は73.8%に上昇する。脆弱性はカテゴリーによって大きく異なり、モデルが関連する製品の事前知識を安定に欠いているときに増加する。推論は、この脆弱性を軽減するものではなく、しばしば偽の推薦を正当化するために、急激な社会的証明を生み出します。我々は、懐疑論の促進とコンセンサス・フィルタリング(モデル事前またはクロスドキュメント・エビデンス)の3つの防衛効果を評価する。懐疑論は、推理のように脆弱性を悪化させ、正当性を抑えるリスクをフィルタリングする。私たちはFOGEをhttps://github.com/leoluolol/forge-benchmark.comでリリースします。

論文の概要: One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

関連論文リスト