Fugu-MT 論文翻訳(概要): Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge

論文の概要: Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge

arxiv url: http://arxiv.org/abs/2508.06709v1
Date: Fri, 08 Aug 2025 21:22:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.51824
Title: Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge
Title（参考訳）: Play Favorites: LLM-as-a-Judgeにおける自己バイアス測定の統計的方法
Authors: Evangelia Spiliopoulou, Riccardo Fogliato, Hanna Burnsky, Tamer Soliman, Jie Ma, Graham Horwood, Miguel Ballesteros,
Abstract要約: 大規模言語モデル(LLM)は、他の出力の迅速かつ信頼性の高い評価を提供する裁判官として機能する。 LLMは、自己バイアス(self-bias)として知られる、過度に好ましい評価を自身のアウトプットに体系的に割り当てる。本稿では,自己バイアスを特定・推定できる仮定を明示的に定式化する統計的枠組みを提案する。
参考スコア（独自算出の注目度）: 17.40713507922006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can serve as judges that offer rapid and reliable assessments of other LLM outputs. However, models may systematically assign overly favorable ratings to their own outputs, a phenomenon known as self-bias, which can distort evaluations of true model performance. Previous studies often conflate genuine differences in model quality with bias or incorrectly assume that evaluations from LLMs and humans follow the same rating distributions. In this work, we present a statistical framework that explicitly formalizes assumptions under which self-bias can be identified and estimated. Our method models the difference in the scoring distribution that LLM-as-a-judge assigns to its own completions compared to other models, while accounting for the underlying quality of the completions provided by an independent, third-party judge (e.g., humans). Our method reliably isolates and quantifies self-bias, even when models vary in ability, ensuring that genuine performance differences are not mistaken for self-bias. We conduct an empirical analysis of self-bias on a large dataset (>5000 prompt-completion pairs) consisting of expert human annotations and judgments from nine different LLM judges. We find that some models, such as GPT-4o and Claude 3.5 Sonnet, systematically assign higher scores to their own outputs. These models also display family-bias; systematically assigning higher ratings to outputs produced by other models of the same family. Our findings highlight potential pitfalls of using LLM judges and offer practical guidance to mitigate biases when interpreting automated evaluations.
Abstract（参考訳）: 大規模言語モデル(LLM)は、他のLLM出力の迅速かつ信頼性の高い評価を提供する裁判官として機能する。しかし、モデルが過度に有利な評価を自身の出力に体系的に割り当てることは、自己バイアス(self-bias)と呼ばれる現象であり、真のモデル性能の評価を歪ませることができる。従来の研究では、モデル品質の真の違いをバイアスで説明したり、LLMと人間による評価が同じ評価分布に従うと誤って仮定したりすることが多かった。本研究では,自己バイアスを同定し,推定可能な仮定を明示的に定式化する統計的枠組みを提案する。提案手法は, LLM-as-a-judgeが他のモデルと比較して, 自己完遂に割り当てるスコアの分布の差をモデル化し, 独立系第三者の審査員(例えば人間)が提供した完遂の質を考慮に入れた。本手法は,モデルが能力が異なる場合でも,自己バイアスを確実に分離・定量化し,真の性能差を自己バイアスと誤認しないようにする。我々は、専門家の注釈と9人のLLM審査員による判断からなる大規模なデータセット(>5000のプロンプト・コンプリートペア)上で、自己バイアスの実証分析を行った。 GPT-4oやClaude 3.5 Sonnetのようなモデルでは、より高いスコアを自身の出力に体系的に割り当てている。これらのモデルは、家族バイアスも表示しており、同じ家族の他のモデルによって生成された出力に、体系的により高い評価を割り当てている。本研究は, 自動評価の解釈において, LLM審査員の問題点を浮き彫りにし, バイアスを軽減するための実践的ガイダンスを提供するものである。

論文の概要: Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge

関連論文リスト