One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment
- URL: http://arxiv.org/abs/2601.18731v1
- Date: Mon, 26 Jan 2026 17:55:52 GMT
- Title: One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment
- Authors: Hongru Cai, Yongqi Li, Tiezheng Yu, Fengbin Zhu, Wenjie Wang, Fuli Feng, Wenjie Li,
- Abstract summary: We argue that addressing these constraints requires a paradigm shift from fitting data to learn user preferences to learn the process of preference adaptation.<n>We propose Meta Reward Modeling (MRM), which reformulates personalized reward modeling as a meta-learning problem.<n>We show that MRM enhances few-shot personalization, improves user robustness, and consistently outperforms baselines.
- Score: 55.86333374784959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Alignment of Large Language Models (LLMs) aims to align outputs with human preferences, and personalized alignment further adapts models to individual users. This relies on personalized reward models that capture user-specific preferences and automatically provide individualized feedback. However, developing these models faces two critical challenges: the scarcity of feedback from individual users and the need for efficient adaptation to unseen users. We argue that addressing these constraints requires a paradigm shift from fitting data to learn user preferences to learn the process of preference adaptation. To realize this, we propose Meta Reward Modeling (MRM), which reformulates personalized reward modeling as a meta-learning problem. Specifically, we represent each user's reward model as a weighted combination of base reward functions, and optimize the initialization of these weights using a Model-Agnostic Meta-Learning (MAML)-style framework to support fast adaptation under limited feedback. To ensure robustness, we introduce the Robust Personalization Objective (RPO), which places greater emphasis on hard-to-learn users during meta optimization. Extensive experiments on personalized preference datasets validate that MRM enhances few-shot personalization, improves user robustness, and consistently outperforms baselines.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - Towards Effective Model Editing for LLM Personalization [36.236438676571034]
We conceptualize personalization as a model editing task and introduce Personalization Editing.<n>This framework applies localized edits guided by clustered preference representations.<n>It achieves higher editing accuracy and greater computational efficiency than fine-tuning.
arXiv Detail & Related papers (2025-12-15T18:58:15Z) - NextQuill: Causal Preference Modeling for Enhancing LLM Personalization [82.15961484963256]
We introduce NextQuill, a novel personalization framework grounded in causal preference modeling.<n>Building on this insight, NextQuill introduces two complementary alignment strategies.<n> Experiments across multiple personalization benchmarks demonstrate that NextQuill significantly improves personalization quality.
arXiv Detail & Related papers (2025-06-03T02:08:55Z) - HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z) - LoRe: Personalizing LLMs via Low-Rank Reward Modeling [47.12507639759984]
We introduce a novel framework that leverages low-rank preference modeling to efficiently learn and generalize user-specific reward functions.<n>We validate our method on multiple preference datasets, demonstrating superior generalization to unseen users and improved accuracy in preference prediction tasks.
arXiv Detail & Related papers (2025-04-20T01:16:24Z) - Personalized Language Models via Privacy-Preserving Evolutionary Model Merging [53.97323896430374]
Personalization in language models aims to tailor model behavior to individual users or user groups.<n>We propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME)<n>PriME employs gradient-free methods to directly optimize utility while reducing privacy risks.<n>Experiments on the LaMP benchmark show that PriME consistently outperforms a range of baselines, achieving up to a 45% improvement in task performance.
arXiv Detail & Related papers (2025-03-23T09:46:07Z) - Personalize Your LLM: Fake it then Align it [12.436528089142698]
CHAMELEON is a scalable and efficient personalization approach that uses self-generated personal preference data and representation editing.<n>Our experiments show that CHAMELEON efficiently adapts models to personal preferences, improving instruction-tuned models and outperforms two personalization baselines by an average of 40%.
arXiv Detail & Related papers (2025-03-02T22:40:10Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.<n>Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.<n>We generate over 1M synthetic personalized preferences using publicly available LLMs.<n>We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.