Synthetic Interaction Data for Scalable Personalization in Large Language Models
- URL: http://arxiv.org/abs/2602.12394v1
- Date: Thu, 12 Feb 2026 20:41:22 GMT
- Title: Synthetic Interaction Data for Scalable Personalization in Large Language Models
- Authors: Yuchen Ma, Yue Huang, Wenjie Wang, Xiaonan Luo, Xiangliang Zhang, Stefan Feuerriegel,
- Abstract summary: We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
- Score: 67.31884245564086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users. This gap is primarily due to (i) the absence of high-quality, privacy-sensitive data that capture personalized user-LLM interactions at scale, and (ii) the lack of robust reward signals for individual preferences. To overcome existing data limitations, we introduce a high-fidelity synthetic data generation framework called PersonaGym. Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process via an agentic LLM system to simulate realistic preference behaviors and semantic-aware noise in order to generate personalized multi-turn interaction trajectories. Using PersonaGym, we release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories that closely mirror real-world preference expression and noise patterns. We further propose Personalized Prompt Optimization (PPOpt), a scalable and model-agnostic framework that optimizes user prompts based on interaction histories without modifying the deployed LLM. PPOpt adopts a reason-then-optimize paradigm that infers an explicit user profile and conditions prompt rewriting on the user profile to avoid reward hacking. Our training procedure for PPOpt integrates a cold-start supervised prior with outcome-driven multi-objective reinforcement learning. We present extensive experiments to demonstrate consistent improvements over state-of-the-art baselines in terms of task performance, personalization quality, and robustness to noisy as well as to sparse preference signals.
Related papers
- One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment [55.86333374784959]
We argue that addressing these constraints requires a paradigm shift from fitting data to learn user preferences to learn the process of preference adaptation.<n>We propose Meta Reward Modeling (MRM), which reformulates personalized reward modeling as a meta-learning problem.<n>We show that MRM enhances few-shot personalization, improves user robustness, and consistently outperforms baselines.
arXiv Detail & Related papers (2026-01-26T17:55:52Z) - Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues [28.522406727886395]
PersonalAgent is a lifelong agent designed to continuously infer and adapt to user preferences.<n>Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines.<n>Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents.
arXiv Detail & Related papers (2025-12-17T10:47:06Z) - Towards Effective Model Editing for LLM Personalization [36.236438676571034]
We conceptualize personalization as a model editing task and introduce Personalization Editing.<n>This framework applies localized edits guided by clustered preference representations.<n>It achieves higher editing accuracy and greater computational efficiency than fine-tuning.
arXiv Detail & Related papers (2025-12-15T18:58:15Z) - POPI: Personalizing LLMs via Optimized Natural Language Preference Inference [42.25870704040321]
POPI is a general framework that introduces a preference inference model to distill heterogeneous user signals into concise natural language summaries.<n>These summaries act as transparent, compact, and transferable personalization representations that condition a shared generation model to produce personalized responses.<n>Extensive experiments across four personalization benchmarks demonstrate that POPI consistently improves personalization accuracy while reducing context overhead by a large margin.
arXiv Detail & Related papers (2025-10-17T23:07:57Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z) - LoRe: Personalizing LLMs via Low-Rank Reward Modeling [47.12507639759984]
We introduce a novel framework that leverages low-rank preference modeling to efficiently learn and generalize user-specific reward functions.<n>We validate our method on multiple preference datasets, demonstrating superior generalization to unseen users and improved accuracy in preference prediction tasks.
arXiv Detail & Related papers (2025-04-20T01:16:24Z) - Personalized Language Models via Privacy-Preserving Evolutionary Model Merging [53.97323896430374]
Personalization in language models aims to tailor model behavior to individual users or user groups.<n>We propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME)<n>PriME employs gradient-free methods to directly optimize utility while reducing privacy risks.<n>Experiments on the LaMP benchmark show that PriME consistently outperforms a range of baselines, achieving up to a 45% improvement in task performance.
arXiv Detail & Related papers (2025-03-23T09:46:07Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''<n>We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.<n>For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.