Fugu-MT 論文翻訳(概要): Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

論文の概要: Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

arxiv url: http://arxiv.org/abs/2603.08091v1
Date: Mon, 09 Mar 2026 08:32:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.70783
Title: Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization
Title（参考訳）: ロバストLSMに基づく判断に向けて:分類学的バイアス評価とバイアス最適化
Authors: Hongli Zhou, Hui Huang, Rui Zhang, Kehai Chen, Bing Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang,
Abstract要約: 大規模言語モデル(LLM)に基づく審査員は、自動評価と報酬モデリングに広く採用されている。 LLMに基づく審査員のバイアスを系統的に定量化するためのベンチマークであるJiceBiasBenchを提案する。我々は、生成的および差別的な裁判官の両方にまたがって実験を行い、現在の裁判官が有意かつ多様なバイアスパターンを示すことを明らかにした。
参考スコア（独自算出の注目度）: 44.252712888022835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based judges are widely adopted for automated evaluation and reward modeling, yet their judgments are often affected by judgment biases. Accurately evaluating these biases is essential for ensuring the reliability of LLM-based judges. However, existing studies typically investigate limited biases under a single judge formulation, either generative or discriminative, lacking a comprehensive evaluation. To bridge this gap, we propose JudgeBiasBench, a benchmark for systematically quantifying biases in LLM-based judges. JudgeBiasBench defines a taxonomy of judgment biases across 4 dimensions, and constructs bias-augmented evaluation instances through a controlled bias injection pipeline, covering 12 representative bias types. We conduct extensive experiments across both generative and discriminative judges, revealing that current judges exhibit significant and diverse bias patterns that often compromise the reliability of automated evaluation. To mitigate judgment bias, we propose bias-aware training that explicitly incorporates bias-related attributes into the training process, encouraging judges to disentangle task-relevant quality from bias-correlated cues. By adopting reinforcement learning for generative judges and contrastive learning for discriminative judges, our methods effectively reduce judgment biases while largely preserving general evaluation capability.
Abstract（参考訳）: 大規模言語モデル(LLM)に基づく判断は、自動評価と報酬モデリングに広く採用されているが、その判断は判断バイアスの影響を受けやすい。 LLMに基づく審査員の信頼性を確保するためには、これらのバイアスを正確に評価することが不可欠である。しかし、既存の研究は通常、単一の判断の定式化の下で限られたバイアス(生成的または識別的)を調査し、包括的な評価を欠いている。このギャップを埋めるため,LLMに基づく審査員のバイアスを系統的に定量化するためのベンチマークであるJiceBiasBenchを提案する。 JudgeBiasBench氏は、4次元にわたる判断バイアスの分類を定義し、12の代表的なバイアスタイプをカバーする、制御されたバイアス注入パイプラインを通じてバイアス増分評価インスタンスを構築する。我々は、生成的および差別的判断の両方にわたって広範な実験を行い、現在の審査員は、しばしば自動評価の信頼性を損なう有意義で多様なバイアスパターンを示すことを明らかにした。判断バイアスを軽減するため,偏見関連属性をトレーニングプロセスに明示的に組み込んだ偏見意識トレーニングを提案し,偏見関連手がかりからタスク関連品質を遠ざけるようにした。生成的判断に対する強化学習と識別的判断に対するコントラスト学習を採用することにより,判定バイアスを効果的に低減し,一般評価能力を大きく維持する。

論文の概要: Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

関連論文リスト