Fugu-MT 論文翻訳(概要): Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

論文の概要: Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

arxiv url: http://arxiv.org/abs/2606.19057v1
Date: Wed, 17 Jun 2026 13:26:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.179976
Title: Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning
Title（参考訳）: 肯定的学習によるLCM評価の定量化と監査
Authors: Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh,
Abstract要約: 大規模言語モデル(LLM)は、スケーラブルな評価のための裁判官として、ますます使われています。 LLMは、意味的品質から切り離された体系的なバイアスを示す。人間の監督は費用がかかり、典型的には選択的であり、信頼できる肯定的な判断を下すが、ほとんどの出力は損なわれず、品質が混ざり合っている可能性がある。
参考スコア（独自算出の注目度）: 4.114698130306098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.
Abstract（参考訳）: 大規模言語モデル (LLM) は, 大規模言語モデル (LLM) をスケーラブルな評価の判断に用いつつあるが, このようなLLM-as-a-Judgeシステムは, 意味的品質から切り離された体系的バイアスを示す。その一方で, 人的監督はコストが高く, 概ね選択的であり, 信頼性の高い肯定的な判断を得られるが, 殆どのアウトプットは問題なく, 品質が混在している。我々は, 選択的人的監督下でのLLM評価を, 正のアンラベリング学習問題として定式化し, 部分的最適移動に基づく幾何的監査フレームワークを提案する。固定埋め込み空間において,人間の検証された正の小さなセットを信頼できない出力のサブセットと整列させることにより,人間の一貫性のある選好を識別し,偏見のある判断を再訓練せずに修正する。実験では、人間の好みとの整合性の改善、プレゼンテーションバイアスに対する堅牢性の向上、信頼度推定の解釈が示され、既存のLCM--as-a-judgeパイプラインに代わるスケーラブルで統計的基盤を提供する。

論文の概要: Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

関連論文リスト