Fugu-MT 論文翻訳(概要): PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs

論文の概要: PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs

arxiv url: http://arxiv.org/abs/2508.10028v1
Date: Fri, 08 Aug 2025 14:32:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.025229
Title: PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
Title（参考訳）: PreF: LLMにおけるパーソナライズドテキスト生成の基準フリー評価
Authors: Xiao Fu, Hossein A. Rahmani, Bin Wu, Jerome Ramos, Emine Yilmaz, Aldo Lipani,
Abstract要約: ユーザ中心の情報システムにはパーソナライズドテキスト生成が不可欠である。 textbfPersonalized textbfReference-free textbfEvaluation textbfFrameworkを紹介する。
参考スコア（独自算出の注目度）: 32.27940625341602
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalised text generation is essential for user-centric information systems, yet most evaluation methods overlook the individuality of users. We introduce \textbf{PREF}, a \textbf{P}ersonalised \textbf{R}eference-free \textbf{E}valuation \textbf{F}ramework that jointly measures general output quality and user-specific alignment without requiring gold personalised references. PREF operates in a three-step pipeline: (1) a coverage stage uses a large language model (LLM) to generate a comprehensive, query-specific guideline covering universal criteria such as factuality, coherence, and completeness; (2) a preference stage re-ranks and selectively augments these factors using the target user's profile, stated or inferred preferences, and context, producing a personalised evaluation rubric; and (3) a scoring stage applies an LLM judge to rate candidate answers against this rubric, ensuring baseline adequacy while capturing subjective priorities. This separation of coverage from preference improves robustness, transparency, and reusability, and allows smaller models to approximate the personalised quality of larger ones. Experiments on the PrefEval benchmark, including implicit preference-following tasks, show that PREF achieves higher accuracy, better calibration, and closer alignment with human judgments than strong baselines. By enabling scalable, interpretable, and user-aligned evaluation, PREF lays the groundwork for more reliable assessment and development of personalised language generation systems.
Abstract（参考訳）: ユーザ中心の情報システムにはパーソナライズされたテキスト生成が不可欠であるが,ほとんどの評価手法はユーザの個性を見落としている。我々は、金の個人化参照を必要とせずに、一般的な出力品質とユーザ固有のアライメントを共同で測定する、textbf{PREF}, a \textbf{P}ersonalized \textbf{R}eference-free \textbf{E}valuation \textbf{F}rameworkを紹介する。 PreF は,(1) 大規模言語モデル (LLM) を用いて,事実性,一貫性,完全性などの普遍的な基準を包括的に網羅したクエリ固有のガイドラインを生成する,(2) 選好段階は,対象ユーザのプロファイル,説明又は推測された選好,文脈を用いて,これらの要因を選択的に拡張し,パーソナライズされた評価ルーブリックを生成する,(3) 評価段階は,このルーブリックに対して候補者の回答を評価するために LLM 判断を適用し,主観的優先順位を捉えながらベースラインの適性を保証する,という3段階のパイプラインで機能する。好みからカバー範囲を分離することで、堅牢性、透明性、再利用性が向上し、より小さなモデルでより大きなモデルのパーソナライズされた品質を近似することができる。 PrefEvalベンチマークの実験では、暗黙の選好フォロータスクを含む実験により、PrefFは強い基準線よりも高い精度、キャリブレーション、人間の判断との密接な一致を実現していることが示された。スケーラブルで解釈可能なユーザアライメント評価を実現することにより、PreFはパーソナライズされた言語生成システムの信頼性を高め、開発するための基盤となる。

論文の概要: PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs

関連論文リスト