Inference-Time Personalized Alignment with a Few User Preference Queries
- URL: http://arxiv.org/abs/2511.02966v1
- Date: Tue, 04 Nov 2025 20:07:03 GMT
- Title: Inference-Time Personalized Alignment with a Few User Preference Queries
- Authors: Victor-Alexandru Pădurean, Parameswaran Kamalaruban, Nachiket Kotalwar, Alkis Gotovos, Adish Singla,
- Abstract summary: We study the problem of aligning a generative model's response with a user's preferences.<n>We propose UserAlign, that elicits the user's preferences with a few queries as pairwise response comparisons.
- Score: 24.28598841525897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of aligning a generative model's response with a user's preferences. Recent works have proposed several different formulations for personalized alignment; however, they either require a large amount of user preference queries or require that the preference be explicitly specified as a text input. In this paper, we propose a novel inference-time personalized alignment method, UserAlign, that elicits the user's preferences with a few queries as pairwise response comparisons. In particular, UserAlign builds on the theoretical framework of best-arm identification in logistic bandits and selects a personalized response from a fixed pool of the model's generated responses. The key idea is to consider the user's feedback consistent and noise-free, and incorporate it into the theoretical framework to identify the best response quickly. Experimental results across several tasks, involving personalized text and image generation, showcase the effectiveness of UserAlign in achieving personalized alignment.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - EXACT: Explicit Attribute-Guided Decoding-Time Personalization [11.035465374731563]
EXACT is a new decoding-time personalization that aligns generation with limited pairwise preference feedback.<n>We show that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.
arXiv Detail & Related papers (2026-02-06T14:53:37Z) - Efficient Personalization of Generative Models via Optimal Experimental Design [31.83801602641749]
We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model.<n>We show that this problem has a convex optimization formulation, and introduce a statistically and computationally efficient algorithm ED-PBRL.<n>We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.
arXiv Detail & Related papers (2025-12-22T05:47:25Z) - Towards Effective Model Editing for LLM Personalization [36.236438676571034]
We conceptualize personalization as a model editing task and introduce Personalization Editing.<n>This framework applies localized edits guided by clustered preference representations.<n>It achieves higher editing accuracy and greater computational efficiency than fine-tuning.
arXiv Detail & Related papers (2025-12-15T18:58:15Z) - PreferThinker: Reasoning-based Personalized Image Preference Assessment [83.66114370585976]
We propose a reasoning-based personalized image preference assessment framework.<n>It first predicts a user's preference profile from reference images.<n>It then provides interpretable, multi-dimensional scores and assessments of candidate images.
arXiv Detail & Related papers (2025-11-01T16:19:51Z) - Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It [81.50711040539566]
Current large language model (LLM) development treats task-solving and preference alignment as separate challenges.<n>We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks.<n>Our framework creates scenarios where identical questions require different reasoning chains depending on user context.
arXiv Detail & Related papers (2025-09-30T18:55:28Z) - HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z) - Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.<n>We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - Beyond the Binary: Capturing Diverse Preferences With Reward Regularization [15.518838657050173]
We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.<n>We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
arXiv Detail & Related papers (2024-12-05T02:35:46Z) - Comparison-based Active Preference Learning for Multi-dimensional Personalization [7.349038301460469]
Large language models (LLMs) have shown remarkable success, but aligning them with human preferences remains a core challenge.<n>Recent studies have explored multi-dimensional personalization, which aims to enable models to generate responses personalized to explicit preferences.<n>We propose Active Multi-dimensional Preference Learning (AMPLe), designed to capture implicit user preferences from interactively collected comparative feedback.
arXiv Detail & Related papers (2024-11-01T11:49:33Z) - Personalized Language Modeling from Personalized Human Feedback [45.16986573937782]
Personalized large language models (LLMs) are designed to tailor responses to individual user preferences.<n>We propose Personalized-RLHF, an efficient framework that utilizes a lightweight user model to capture individual user preferences.<n>We show that personalized LLMs trained using P-RLHF generate responses that are more closely aligned with individual user preferences.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback [76.7007545844273]
We propose a multi-objective decision making framework that accommodates different user preferences over objectives.
Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector.
We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
arXiv Detail & Related papers (2023-02-07T23:58:19Z) - Everyone's Preference Changes Differently: Weighted Multi-Interest
Retrieval Model [18.109035867113217]
Multi-Interest Preference (MIP) model is an approach that produces multi-interest for users by using the user's sequential engagement more effectively.
Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-14T04:29:54Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - The Stereotyping Problem in Collaboratively Filtered Recommender Systems [77.56225819389773]
We show that matrix factorization-based collaborative filtering algorithms induce a kind of stereotyping.
If preferences for a textitset of items are anti-correlated in the general user population, then those items may not be recommended together to a user.
We propose an alternative modelling fix, which is designed to capture the diverse multiple interests of each user.
arXiv Detail & Related papers (2021-06-23T18:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.