Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
- URL: http://arxiv.org/abs/2504.12663v1
- Date: Thu, 17 Apr 2025 05:50:13 GMT
- Title: Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
- Authors: Xiaotian Zhang, Ruizhe Chen, Yang Feng, Zuozhu Liu,
- Abstract summary: Persona-judge is a novel discriminative paradigm that enables training-free personalized alignment with unseen preferences.<n>We show that Persona-judge offers a scalable and computationally efficient solution to personalized alignment.
- Score: 21.677859755364334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning language models with human preferences presents significant challenges, particularly in achieving personalization without incurring excessive computational costs. Existing methods rely on reward signals and additional annotated data, limiting their scalability and adaptability to diverse human values. To address these challenges, we introduce Persona-judge, a novel discriminative paradigm that enables training-free personalized alignment with unseen preferences. Instead of optimizing policy parameters through external reward feedback, Persona-judge leverages the intrinsic preference judgment capabilities of the model. Specifically, a draft model generates candidate tokens conditioned on a given preference, while a judge model, embodying another preference, cross-validates the predicted tokens whether to be accepted. Experimental results demonstrate that Persona-judge, using the inherent preference evaluation mechanisms of the model, offers a scalable and computationally efficient solution to personalized alignment, paving the way for more adaptive customized alignment.
Related papers
- Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners [16.863492060519157]
This work addresses the challenge of personalizing trajectories generated in automated decision-making systems.
We propose a resource-efficient approach that enables rapid adaptation to individual users' preferences.
arXiv Detail & Related papers (2025-03-24T05:11:58Z) - Capturing Individual Human Preferences with Reward Features [47.43999785878563]
We show that individual preferences can be captured as a linear combination of a set of general reward features.
We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual.
We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts.
arXiv Detail & Related papers (2025-03-21T17:39:33Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality.<n>We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations.<n>We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Unsupervised Human Preference Learning [7.959043497459107]
Large language models demonstrate impressive reasoning abilities but struggle to provide personalized content.
Existing methods, such as in-context learning and parameter-efficient fine-tuning, fall short in capturing the complexity of human preferences.
We propose a novel approach utilizing small parameter models as preference agents to generate natural language rules that guide a larger, pre-trained model.
arXiv Detail & Related papers (2024-09-30T17:51:01Z) - Personality Alignment of Large Language Models [30.710131188931317]
Personality Alignment aims to align large language models with individual user preferences.<n>This dataset includes data from over 320,000 real subjects across multiple personality assessments.<n>We develop an activation intervention optimization method to efficiently align with individual behavioral preferences.<n>Our work paves the way for future AI systems to make decisions and reason in truly personality ways.
arXiv Detail & Related papers (2024-08-21T17:09:00Z) - Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier [0.5120567378386615]
We propose a unified approach to aligning large language models (LLMs)
Based on a simple decomposition of preference and auxiliary objectives, we allow for tuning LLMs to optimize user and designer preferences.
arXiv Detail & Related papers (2024-05-28T08:35:48Z) - Personalized Language Modeling from Personalized Human Feedback [45.16986573937782]
Personalized large language models (LLMs) are designed to tailor responses to individual user preferences.<n>We propose Personalized-RLHF, an efficient framework that utilizes a lightweight user model to capture individual user preferences.<n>We show that personalized LLMs trained using P-RLHF generate responses that are more closely aligned with individual user preferences.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Models of human preference for learning reward functions [80.39289349661364]
We learn the reward function from human-generated preferences between pairs of trajectory segments.
We find this assumption to be flawed and propose modeling human preferences as informed by each segment's regret.
Our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned.
arXiv Detail & Related papers (2022-06-05T17:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.