Fugu-MT 論文翻訳(概要): What Is Missing: Interpretable Ratings for Large Language Model Outputs

論文の概要: What Is Missing: Interpretable Ratings for Large Language Model Outputs

arxiv url: http://arxiv.org/abs/2603.04429v1
Date: Tue, 17 Feb 2026 14:04:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.208081
Title: What Is Missing: Interpretable Ratings for Large Language Model Outputs
Title（参考訳）: 欠けていること:大規模言語モデルのアウトプットに対する解釈可能なレーティング
Authors: Nicholas Stranges, Yimin Yang,
Abstract要約: 自然言語フィードバックからランキングを作成するために,What Is Missing (WIM) レーティングシステムを導入する。 WIMは既存のトレーニングパイプラインに統合され、他の評価手法と組み合わせることができる。我々は、離散的な数値評価と比較して、WIMがより少ない関係とより大きな評価デルタを得られることを実証的に観察した。
参考スコア（独自算出の注目度）: 4.402604078675521
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization and Direct Preference Optimization learn from direct rankings or numerical ratings of model outputs, these rankings are subjective, and a single numerical rating chosen directly by a judge is a poor proxy for the quality of natural language, we introduce the What Is Missing (WIM) rating system to produce rankings from natural-language feedback, WIM integrates into existing training pipelines, can be combined with other rating techniques, and can be used as input to any preference learning method without changing the learning algorithm, to compute a WIM rating, a human or LLM judge writes feedback describing what the model output is missing, we embed the output and the feedback with a sentence embedding model and compute the cosine similarity between the resulting vectors, we empirically observe that, compared to discrete numerical ratings, WIM yields fewer ties and larger rating deltas, which improves the availability of a learning signal in pairwise preference data, we use interpretable in the following limited sense: for each scalar rating, we can inspect the judge's missing-information text that produced it, enabling qualitative debugging of the preference labels.
Abstract（参考訳）: 現在の大規模言語モデル(LLM)の選好学習手法である、近似ポリシー最適化や直接選好最適化は、直接ランク付けやモデル出力の数値評価から学習し、これらのランク付けは主観的であり、裁判官が直接選択した単一の数値評価は、自然言語の質の指標として不十分なものである。我々は、自然言語フィードバックからランク付けを生成できるWhat Is Missing(WIM)レーティングシステムを導入し、WIMを既存のトレーニングパイプラインに統合し、学習アルゴリズムを変更せずに任意の選好学習方法への入力として使用することができる。

論文の概要: What Is Missing: Interpretable Ratings for Large Language Model Outputs

関連論文リスト