Fugu-MT 論文翻訳(概要): The AI Epistemic Deference Index: A Continuous Measure of Sycophancy

論文の概要: The AI Epistemic Deference Index: A Continuous Measure of Sycophancy

arxiv url: http://arxiv.org/abs/2606.07897v1
Date: Fri, 05 Jun 2026 23:16:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.521236
Title: The AI Epistemic Deference Index: A Continuous Measure of Sycophancy
Title（参考訳）: AIてんかん評価指標 : 症状の連続測定
Authors: Alejandro Botas, Paul de Font-Reaulx, Luke Hewitt,
Abstract要約: 我々はAIてんかん評価指標(AEDI)を提案する。 AEDIは、モデルの出力で表されるサポートが、ユーザのプロンプトで表される態度にどれほど敏感であるかを表す、連続した一次元スコアである。さまざまなトピックにまたがる500の命題と16,000のプロンプトからなる新しいキュレートされたデータベースにデプロイし、8つの著名なモデルをテストします。
参考スコア（独自算出の注目度）: 42.31792244964347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user. Existing evaluations typically measure this either by assessing what it takes to make a model shift a binary endorsement or by eliciting an explicit probability in a proposition. However, much user-facing sycophantic behavior is demonstrated through shifts in graded support expressed through ordinary language. We propose the AI Epistemic Deference Index (AEDI): a continuous, unidimensional score representing how sensitive the support expressed in a model's output is to the attitude expressed in a user's prompt. To generate AEDI, we provide a new protocol for estimating probabilities from natural language outputs, using LLMs-as-judges validated for consistency and correlation to human judgment. We deploy it on a new curated database of 500 propositions across diverse topics and 16,000 prompts varying in user attitude, testing eight prominent models. Every model exhibits substantial deference, though with large and systematic differences across providers, with Claude models demonstrating the least, and Grok and Gemini models the most. The effect is amplified in prompts requesting a written artifact, and concentrated on propositions where models hold weaker priors. We release AEDI as an easy-to-update benchmark and measurement pipeline for output-level sycophancy evaluation.
Abstract（参考訳）: 現在のAIモデルは、しばしばてんかんの症状を示し、ユーザーと同意する主張を支持している。既存の評価は、モデルがバイナリの支持をシフトさせるのに何が必要かを評価するか、あるいは命題の明示的な確率を引き出すかによって、これを測るのが一般的である。しかし、通常の言語で表現される段階的サポートのシフトを通じて、多くのユーザ向けサイコファンティックな行動が示される。本稿では,AI Epistemic Deference Index(AEDI:AI Epistemic Deference Index)を提案する。 AEDIを生成するために,人間の判断に対する一貫性と相関性を検証したLLM-as-judgeを用いて,自然言語出力から確率を推定する新しいプロトコルを提案する。さまざまなトピックにまたがる500の命題と16,000のプロンプトからなる新しいキュレートされたデータベースにデプロイし、8つの著名なモデルをテストします。すべてのモデルは、プロバイダ間で大きく、体系的な違いがあり、クロードモデルが最も多く、グロクモデルとジェミニモデルが最も多く示される。この効果は、記述されたアーティファクトを要求するプロンプトで増幅され、モデルがより弱い事前を保持する命題に集中する。我々はAEDIを,出力レベルの梅毒評価のための簡易なベンチマークおよび測定パイプラインとしてリリースする。

論文の概要: The AI Epistemic Deference Index: A Continuous Measure of Sycophancy

関連論文リスト