Fugu-MT 論文翻訳(概要): Many Preferences, Few Policies: Towards Scalable Language Model Personalization

論文の概要: Many Preferences, Few Policies: Towards Scalable Language Model Personalization

arxiv url: http://arxiv.org/abs/2604.04144v2
Date: Fri, 10 Apr 2026 17:55:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 15:53:49.857479
Title: Many Preferences, Few Policies: Towards Scalable Language Model Personalization
Title（参考訳）: 言語モデルのパーソナライズに向けて
Authors: Cheol Woo Kim, Jai Moondra, Roozbeh Nahavandi, Andrew Perrault, Milind Tambe, Swati Gupta,
Abstract要約: LLMパーソナライゼーションの聖杯は、各ユーザのための単一のLCMで、そのユーザの好みと完全に一致しています。ヘテロジニアスユーザ間の代表的振る舞いをキャプチャする,LLMの小さなポートフォリオを選択するための基本的手法を開発した。これらの保証を検証し、共通のベースラインよりも高い出力多様性を示す実験結果を提供する。
参考スコア（独自算出の注目度）: 26.263947748558824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The holy grail of LLM personalization is a single LLM for each user, perfectly aligned with that user's preferences. However, maintaining a separate LLM per user is impractical due to constraints on compute, memory, and system complexity. We address this challenge by developing a principled method for selecting a small portfolio of LLMs that captures representative behaviors across heterogeneous users. We model user preferences across multiple traits (e.g., safety, humor, brevity) through a multi-dimensional weight vector. Given reward functions across these dimensions, our algorithm PALM (Portfolio of Aligned LLMs) generates a small portfolio of LLMs such that, for any weight vector, the portfolio contains a near-optimal LLM for the corresponding scalarized objective. To the best of our knowledge, this is the first result that provides theoretical guarantees on both the size and approximation quality of LLM portfolios for personalization. It characterizes the trade-off between system cost and personalization, as well as the diversity of LLMs required to cover the landscape of user preferences. We provide empirical results that validate these guarantees and demonstrate greater output diversity over common baselines.
Abstract（参考訳）: LLMパーソナライゼーションの聖杯は、ユーザ毎にひとつのLLMで、そのユーザの好みと完全に一致しています。しかし、コンピューティング、メモリ、システムの複雑さに制約があるため、ユーザ毎に別々のLLMを維持することは現実的ではない。この課題に対処するために、異種ユーザ間の代表的振る舞いをキャプチャするLLMの小さなポートフォリオを選択するための原則的手法を開発した。多次元の重みベクトルを用いて、複数の特性(例えば、安全性、ユーモア、簡潔さ)にわたるユーザの好みをモデル化する。これらの次元にわたって報酬関数が与えられると、我々のアルゴリズムPALM(Portfolio of Aligned LLMs)は、任意の重みベクトルに対して、対応するスカラー化目的に対してほぼ最適のLPMを含むような、小さなLCMのポートフォリオを生成する。我々の知る限りでは、パーソナライズのためのLCMポートフォリオのサイズと近似品質を理論的に保証する最初の結果である。システムコストとパーソナライゼーションのトレードオフと、ユーザ好みの状況をカバーするのに必要なLCMの多様性を特徴付ける。これらの保証を検証し、共通のベースラインよりも高い出力多様性を示す実験結果を提供する。

論文の概要: Many Preferences, Few Policies: Towards Scalable Language Model Personalization

関連論文リスト