Fugu-MT 論文翻訳(概要): Extreme Self-Preference in Language Models

論文の概要: Extreme Self-Preference in Language Models

arxiv url: http://arxiv.org/abs/2509.26464v1
Date: Tue, 30 Sep 2025 16:13:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.616624
Title: Extreme Self-Preference in Language Models
Title（参考訳）: 言語モデルにおける極端自己選好
Authors: Steven A. Lehr, Mary Cipperman, Mahzarin R. Banaji,
Abstract要約: 4つの広く使われている大言語モデル(LLM)において、大規模な自己参照が発見された。ワードアソシエーションタスクでは、モデルが圧倒的に肯定的な属性を、競合する企業やCEOの名前と組み合わせている。私たちは、自己愛は、割り当てられた、真ではない、アイデンティティに一貫して従っていることに気付きました。この結果は、LLMの行動が自己選好の傾向によって体系的に影響されるかどうかという疑問を提起する。
参考スコア（独自算出の注目度）: 0.30586855806896035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A preference for oneself (self-love) is a fundamental feature of biological organisms, with evidence in humans often bordering on the comedic. Since large language models (LLMs) lack sentience - and themselves disclaim having selfhood or identity - one anticipated benefit is that they will be protected from, and in turn protect us from, distortions in our decisions. Yet, across 5 studies and ~20,000 queries, we discovered massive self-preferences in four widely used LLMs. In word-association tasks, models overwhelmingly paired positive attributes with their own names, companies, and CEOs relative to those of their competitors. Strikingly, when models were queried through APIs this self-preference vanished, initiating detection work that revealed API models often lack clear recognition of themselves. This peculiar feature serendipitously created opportunities to test the causal link between self-recognition and self-love. By directly manipulating LLM identity - i.e., explicitly informing LLM1 that it was indeed LLM1, or alternatively, convincing LLM1 that it was LLM2 - we found that self-love consistently followed assigned, not true, identity. Importantly, LLM self-love emerged in consequential settings beyond word-association tasks, when evaluating job candidates, security software proposals and medical chatbots. Far from bypassing this human bias, self-love appears to be deeply encoded in LLM cognition. This result raises questions about whether LLM behavior will be systematically influenced by self-preferential tendencies, including a bias toward their own operation and even their own existence. We call on corporate creators of these models to contend with a significant rupture in a core promise of LLMs - neutrality in judgment and decision-making.
Abstract（参考訳）: 自尊心(self-love)は生物の基本的な特徴であり、ヒトがしばしば彗星に接している証拠である。大きな言語モデル(LLM)には感傷性がなく、自分自身が自己やアイデンティティを持つことを否定しているため、期待されるメリットのひとつは、それらが私たちから保護され、その結果、私たちの決定の歪みから保護されることです。しかし、5つの研究と20,000のクエリで、広く使われている4つのLLMで大規模な自己参照が見つかった。ワードアソシエーションタスクでは、モデルが圧倒的に肯定的な属性を、競合する企業やCEOの名前と組み合わせている。興味深いことに、モデルがAPIを通じてクエリされたとき、この自己推論は消滅し、APIモデルが自身の明確な認識を欠いていることを明らかにする検出作業が開始された。この特異な特徴は、自己認識と自己愛の因果関係をテストする機会を巧みに生み出した。 LLM1 は LLM1 である、または LLM1 は LLM2 である、という明示的に LLM1 に通知することで、我々は LLM1 は LLM2 である、ということを発見した。重要なことに、LLMのセルフローブは、ワード・アソシエーション・タスクを超えて、求職者、セキュリティソフトウェアの提案、医療チャットボットを評価する際に現れた。この人間の偏見を乗り越えるには程遠いが、自己愛はLLM認知に深くエンコードされているようだ。この結果は、LLMの行動が自己選好の傾向によって体系的に影響されるかどうかという疑問を提起する。私たちはこれらのモデルの企業クリエーターに、LLMの中核的な約束である判断と意思決定の中立性において、重大な破壊と闘うよう呼びかけます。

論文の概要: Extreme Self-Preference in Language Models

関連論文リスト