Fugu-MT 論文翻訳(概要): SenseJudge: Human-Centric Preference-Driven Judgment Framework

論文の概要: SenseJudge: Human-Centric Preference-Driven Judgment Framework

arxiv url: http://arxiv.org/abs/2606.03189v2
Date: Wed, 03 Jun 2026 18:53:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 19:21:33.190136
Title: SenseJudge: Human-Centric Preference-Driven Judgment Framework
Title（参考訳）: SenseJudge: 人中心の推論駆動判断フレームワーク
Authors: Rui Li, Junfeng Liu, Xiangwen Kong, Linhai Xu, Zhifang Sui,
Abstract要約: SenseJudgeは、人間の嗜好によって駆動されるカスタマイズ可能な判断フレームワークである。本研究では,(1)パーソナライズされた審査員としてのLLMと(2)モデルランキングの2つのタスクにSenseJudgeを適用した。その結果,SenseJudge は LLMs-as-personalized-judges タスクやモデルランキングにおいて,他の判断方法やモデルを上回っていることがわかった。
参考スコア（独自算出の注目度）: 25.891144111334786
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) as judges across various scenarios such as assessing model responses is becoming an increasingly accepted paradigm. However, existing judgment approaches often rely on trained judgers using fixed preference data, which tend to overlook diverse user preferences and struggle to adapt to real-world human-AI dialogue scenarios. To address these limitations, we propose SenseJudge, a customizable judgment framework driven by human preferences and SenseBench, a diverse and challenging instruction-following benchmark derived from real-world multi-turn interactions. We applied the automatic judgment framework and benchmark to two tasks: (1) LLMs as personalized judges, and (2) model ranking. We conducted extensive experiments, and the results demonstrate that the SenseJudge framework surpasses other judgment methods and models in the LLMs-as-personalized-judges task and achieves model ranking that aligns with real human sense. Additionally, we conducted analyses on position bias and consistency, alongside ablation studies, which affirmed the robustness of SenseJudge.
Abstract（参考訳）: 大規模言語モデル(LLM)は、モデル応答の評価などの様々なシナリオの審査員として、ますます受け入れられるパラダイムになりつつある。しかし、既存の判断アプローチは、様々なユーザの好みを見落とし、現実世界の人間とAIの対話シナリオに適応するのに苦労する傾向にある固定された嗜好データを使用して、訓練された審査員に頼っていることが多い。これらの制約に対処するために、人間の嗜好によって駆動されるカスタマイズ可能な判断フレームワークであるSenseJudgeと、実世界のマルチターンインタラクションから派生した多彩で挑戦的な命令追従ベンチマークであるSenseBenchを提案する。自動判断フレームワークとベンチマークを2つのタスクに適用し,(1)パーソナライズされた審査員としてのLLM,(2)モデルランキングについて検討した。本研究では,SenseJudge フレームワークが LLMs-as-personalized-judges タスクの他の判断方法やモデルを上回っ,実際の人間の感覚に合わせたモデルランキングを実現することを実証した。さらに,SenseJudgeの頑健さを裏付けるアブレーション研究とともに,位置バイアスと一貫性の分析を行った。

論文の概要: SenseJudge: Human-Centric Preference-Driven Judgment Framework

関連論文リスト