Fugu-MT 論文翻訳(概要): Preference-Aware Rubric Learning for Personalized Evaluation

論文の概要: Preference-Aware Rubric Learning for Personalized Evaluation

arxiv url: http://arxiv.org/abs/2605.31545v1
Date: Fri, 29 May 2026 17:00:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-01 20:56:50.763251
Title: Preference-Aware Rubric Learning for Personalized Evaluation
Title（参考訳）: パーソナライズド・アウェア・ルーブリック・ラーニングによる個人評価
Authors: Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yuxin Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Yoko Yamakata, Tat-Seng Chua,
Abstract要約: 既存の評価手法では、長期的なインタラクション履歴に埋め込まれたユーザ固有の嗜好をキャプチャできない。静的判断よりも学習問題としてパーソナライズされた評価を定式化するパラダイムであるパーソナライズド・アズ・ラーニングを提案する。実験により、PARLはユーザ対応の応答を確実に識別し、ユーザ間で一般化する高忠実なルーブリックを一貫して誘導することが示された。
参考スコア（独自算出の注目度）: 59.539429430690156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the evaluation of personalized alignment a critical bottleneck. Existing evaluation methods-ranging from automatic metrics to LLM-as-a-judge approaches-fail to capture subjective, user-specific preferences embedded in long-term interaction histories. We identify three essential principles for reliable and effective personalized evaluation: Representativeness, User-Consistency, and Discriminativeness. To address these principles, we introduce Personalized Evaluation as Learning, a paradigm that formulates personalized evaluation as a learning problem rather than a static judgment. Under this paradigm, we propose PARL (Preference-Aware Rubric Learning for Personalized Evaluation), a framework that learns to induce preference-aware evaluation rubrics directly from raw user histories and performs a self-validation mechanism to ensure consistency with the user's preferences. PARL integrates rubric induction with a discriminative reinforcement learning objective that contrasts user-authored responses against competitive personalized model outputs, enabling the learned rubrics to capture precise, user-specific decision boundaries. Experiments on real-world personalized text generation tasks show that PARL consistently induces high-fidelity rubrics that reliably identify user-aligned responses and generalize across users and tasks, while capturing stable stylistic preferences and fine-grained evaluative patterns. To ensure reproducibility, our code is available at https://github.com/SnowCharmQ/PARL.
Abstract（参考訳）: 大言語モデル(LLM)が汎用アシスタントからユーザ中心エージェントへと進化するにつれて、個人化はモデル行動と個人の嗜好の整合の中心となり、パーソナライズされたアライメントの評価が重要なボトルネックとなっている。 LLM-as-a-judgeアプローチへの既存の評価手法は、長期的な相互作用履歴に埋め込まれた主観的、ユーザ固有の嗜好を捉える。信頼性と効果的なパーソナライズされた評価のための3つの基本原則を同定する。これらの原則に対処するために,静的な判断ではなく,パーソナライズされた評価を学習問題として定式化するパラダイムであるパーソナライズド・アズ・ラーニングを導入する。本パラダイムでは,ユーザの嗜好と整合性を確保するための自己検証機構を実装したPARL(Preference-Aware Rubric Learning for Personalized Evaluation)を提案する。 PARLは、ルーブリック誘導と識別的強化学習の目標を統合し、ユーザが承認した応答と競合するパーソナライズされたモデル出力を対比することにより、学習したルーブリックが正確なユーザ固有の決定境界をキャプチャすることを可能にする。実世界のパーソナライズされたテキスト生成タスクの実験では、PARLは、安定したスタイル的嗜好ときめ細かい評価パターンをキャプチャしながら、ユーザ対応の応答を確実に識別し、ユーザとタスクをまたがって一般化する高忠実なルーリックを一貫して誘導している。再現性を確保するため、私たちのコードはhttps://github.com/SnowCharmQ/PARL.comで利用可能です。

論文の概要: Preference-Aware Rubric Learning for Personalized Evaluation

関連論文リスト