Fugu-MT 論文翻訳(概要): APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

論文の概要: APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

arxiv url: http://arxiv.org/abs/2605.21063v1
Date: Wed, 20 May 2026 11:47:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.651547
Title: APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings
Title（参考訳）: APM:任意選好写像を用いたLLMのスタイルパーソナライズ評価
Authors: Philipp Spohn, Leander Girrbach, Zeynep Akata,
Abstract要約: Arbitrary Preference Mapping ベンチマークを導入し,ユーザの属性を応答特性の嗜好にマッピングする。 $mathbfC$は意味的内容を持たないので、モデルはステレオタイプ的関連を利用できない。 Llama-3.1-8B と Qwen-3.5-27B で検索・最適化・ルーティング・パーソナライズ手法を適用した。
参考スコア（独自算出の注目度）: 43.5967188676583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Typical LLM responses tend to follow a default style, even though users often have distinct preferences regarding tone, verbosity, and formality that they do not explicitly state in their prompts. Evaluating whether personalization methods can adapt to these implicit preferences is challenging, since users typically provide prompts rather than reference responses, style preferences are not factually verifiable, and reference-free LLM judges may conflate personalization with general response quality. To address these challenges, we introduce the Arbitrary Preference Mapping (APM) benchmark, which decouples user attributes (e.g. enthusiastic) from response principles (e.g. persuasive) via a hidden, randomized mapping $\mathbf{C}$ that maps user attributes to preferences about response traits. Because $\mathbf{C}$ carries no semantic content and is resampled across runs, models cannot exploit stereotypical associations and must infer preferences from conversation history. Using this unbiased evaluation methodology, we adapt retrieval-augmented, prompt-optimization, and routing personalization methods and evaluate them on Llama-3.1-8B and Qwen-3.5-27B. Our results show that routing is the most reliable approach, while RAG only improves with the stronger base LLM, and soft prompt optimization fails to improve significantly over a non-personalized baseline. Our extensive evaluation reveals that in this realistic setting, personalization remains challenging, but our adapted methods show promise.
Abstract（参考訳）: 典型的なLCM応答はデフォルトスタイルに従う傾向があるが、ユーザーはしばしば、そのプロンプトに明示的に記述していないトーン、冗長性、フォーマル性に関して明確な好みを持っている。パーソナライズ手法がこれらの暗黙の選好に適応できるかどうかを評価することは困難であり、ユーザーは通常、参照応答よりもプロンプトを提供するが、スタイル選好は事実検証不可能であり、レファレンスフリーのLCM審査員はパーソナライズを一般的な応答品質と説明できる可能性がある。これらの課題に対処するため、Arbitrary Preference Mapping (APM)ベンチマークを導入し、ユーザ属性を応答特性に関する好みにマッピングする、隠れたランダム化されたマッピング$\mathbf{C}$を介して、応答原理(例えば説得力のある)からユーザ属性を分離する。 $\mathbf{C}$はセマンティックな内容を持たず、実行中に再サンプリングされるため、モデルはステレオタイプ的な関連を活用できず、会話履歴から好みを推測しなければならない。この非バイアス評価手法を用いて、検索強化、プロンプト最適化、ルーティングパーソナライズ手法を適用し、Llama-3.1-8BとQwen-3.5-27Bで評価する。その結果,RAGはより強力なLLMでのみ改善され,ソフトプロンプト最適化は非個人化ベースラインよりも大幅に改善されないことがわかった。この現実的な環境では、パーソナライゼーションは依然として難しいが、我々の適応した手法は有望であることを示している。

論文の概要: APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

関連論文リスト