Fugu-MT 論文翻訳(概要): Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

論文の概要: Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

arxiv url: http://arxiv.org/abs/2511.05286v1
Date: Fri, 07 Nov 2025 14:48:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 21:00:44.795959
Title: Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models
Title（参考訳）: リフレクティブパーソナライゼーション最適化:ブラックボックス大言語モデルのためのポストホック書き換えフレームワーク
Authors: Teqi Hao, Xioayu Tan, Shaojie Shi, Yinghui Xu, Xihe Qiu,
Abstract要約: 本稿では、コンテンツ生成をアライメントから切り離してパーソナライズパラダイムを再定義するフレームワークであるリフレクティブパーソナライズ最適化(RPO)を提案する。 RPOは2つの異なる段階で動作する: まず、ベースモデルが高品質で汎用的な応答を生成し、その後、外部反射モジュールがこの出力を明示的に書き直してユーザの好みに合わせる。 LaMPベンチマークの総合的な実験により、RPOはパーソナライゼーションからコンテンツ生成を分離することで、最先端のベースラインを大幅に上回ることを示した。
参考スコア（独自算出の注目度）: 16.152962349146275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The personalization of black-box large language models (LLMs) is a critical yet challenging task. Existing approaches predominantly rely on context injection, where user history is embedded into the prompt to directly guide the generation process. However, this single-step paradigm imposes a dual burden on the model: generating accurate content while simultaneously aligning with user-specific styles. This often results in a trade-off that compromises output quality and limits precise control. To address this fundamental tension, we propose Reflective Personalization Optimization (RPO), a novel framework that redefines the personalization paradigm by decoupling content generation from alignment. RPO operates in two distinct stages: first, a base model generates a high-quality, generic response; then, an external reflection module explicitly rewrites this output to align with the user's preferences. This reflection module is trained using a two-stage process. Initially, supervised fine-tuning is employed on structured rewriting trajectories to establish a core personalized reasoning policy that models the transformation from generic to user-aligned responses. Subsequently, reinforcement learning is applied to further refine and enhance the quality of the personalized outputs. Comprehensive experiments on the LaMP benchmark demonstrate that RPO, by decoupling content generation from personalization, significantly outperforms state-of-the-art baselines. These findings underscore the superiority of explicit response shaping over implicit context injection. Moreover, RPO introduces an efficient, model-agnostic personalization layer that can be seamlessly integrated with any underlying base model, paving the way for a new and effective direction in user-centric generation scenarios.
Abstract（参考訳）: ブラックボックスの大規模言語モデル(LLM)のパーソナライズは重要な課題である。既存のアプローチは主にコンテキスト注入に依存しており、ユーザ履歴が生成プロセスを直接ガイドするプロンプトに埋め込まれている。しかし、このシングルステップのパラダイムは、正確なコンテンツを生成しながら、ユーザ固有のスタイルと同時に調整するという、モデルに二重の負担をかける。これはしばしば、出力品質を妥協し、正確な制御を制限するトレードオフをもたらす。この基本的な緊張に対処するために,コンテンツ生成をアライメントから切り離してパーソナライズパラダイムを再定義する新しいフレームワークであるリフレクティブパーソナライズ最適化(RPO)を提案する。 RPOは2つの異なる段階で動作する: まず、ベースモデルが高品質で汎用的な応答を生成し、その後、外部反射モジュールがこの出力を明示的に書き直してユーザの好みに合わせる。このリフレクションモジュールは、2段階のプロセスで訓練される。当初、教師付き微調整は構造化されたリライトトラジェクトリに使われ、ジェネリックからユーザ対応の応答への変換をモデル化するパーソナライズされた推論ポリシーを確立する。その後、強化学習を適用して、パーソナライズされた出力の品質をさらに洗練し、向上させる。 LaMPベンチマークの総合的な実験により、RPOはパーソナライゼーションからコンテンツ生成を分離することで、最先端のベースラインを大幅に上回ることを示した。これらの知見は、暗黙の文脈注入よりも明示的な応答整形が優れていることを裏付けるものである。さらに、RPOは効率的なモデルに依存しないパーソナライゼーションレイヤを導入しています。

論文の概要: Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

関連論文リスト