Fugu-MT 論文翻訳(概要): Exploring LLM biases to manipulate AI search overview

論文の概要: Exploring LLM biases to manipulate AI search overview

arxiv url: http://arxiv.org/abs/2605.00012v1
Date: Mon, 30 Mar 2026 06:36:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 06:56:26.39952
Title: Exploring LLM biases to manipulate AI search overview
Title（参考訳）: AI検索概要を操作するLLMバイアスの探索
Authors: Roman Smirnov,
Abstract要約: 本研究は, LLM概要システムにおけるバイアスの存在を調査することに焦点を当てている。我々は、強化学習を用いて小さな言語モデルを訓練し、LLM概要で好まれる可能性を高めるためにスニペットを書き換える。その結果、LLM概要システムにはバイアスがあり、ほとんどの場合、強化学習はスニペットのコンテンツを最適化してLLM概要を操作できることが証明された。
参考スコア（独自算出の注目度）: 0.10687198029679434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern large language models (LLMs) are used in many business applications in general, and specifically in web search systems and applications that generate overviews of search results - LLM Overview systems. Such systems are using an LLM to select most relevant sources from search results and generate an answer to the user's query. It is known from many studies that LLMs have different biases, in LLM Overview application both the source selection and answer generation stages may be affected by the biases of LLMs (here we are focusing mainly on the selection stage). This research is focused on investigating the presence of the biases in LLM Overview systems and on biases exploitation to manipulate LLM Overview results. Here we train a small language model using reinforcement learning to rewrite search snippets to increase their likelihood of being preferred by an LLM Overview. Our experimental setup intentionally restricts the policy to operate only on snippets and limits reward-hacking strategies, reflecting realistic constraints of web search environments. The results prove that LLM Overview systems have biases and that reinforcement learning in most of the cases can optimize snippet's content to manipulate LLM Overview results. We also prove that LLM Overview selections are driven by comparative rather than absolute advantages among candidate sources. In addition, we examine safety aspects of LLM Overview manipulation possibilities and show that context poisoning attacks can lead to inaccurate or harmful results.
Abstract（参考訳）: 現代の大規模言語モデル(LLM)は、一般に多くのビジネスアプリケーション、特に検索結果の概要を生成するWeb検索システムやアプリケーションで使われている。このようなシステムは、LLMを使用して検索結果から最も関連性の高いソースを選択し、ユーザのクエリに対する回答を生成する。多くの研究から、LLMは異なるバイアスを持つことが知られているが、LLM概要アプリケーションでは、ソース選択と回答生成段階の両方がLLMのバイアスに影響される可能性がある(ここでは、主に選択段階に焦点を当てている)。本研究は, LLM概要システムにおけるバイアスの存在と, LLM概要結果を操作するためのバイアス利用について検討する。ここでは、強化学習を用いて小さな言語モデルを訓練し、検索スニペットを書き換え、LLM概要で好まれる可能性を高める。実験では,スニペットのみを運用する方針を意図的に制限し,Web検索環境の現実的な制約を反映して報酬獲得戦略を制限している。その結果、LLM概要システムにはバイアスがあり、ほとんどの場合、強化学習はスニペットのコンテンツを最適化してLLM概要を操作できることが証明された。また、LLM概要選択は、候補ソース間の絶対的な優位性よりも、比較によって駆動されることを示す。さらに, LLM概要操作の可能性の安全性について検討し, 文脈汚染攻撃が不正確あるいは有害な結果をもたらすことを示す。

論文の概要: Exploring LLM biases to manipulate AI search overview

関連論文リスト