Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
- URL: http://arxiv.org/abs/2412.03822v1
- Date: Thu, 05 Dec 2024 02:35:46 GMT
- Title: Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
- Authors: Vishakh Padmakumar, Chuanyang Jin, Hannah Rose Kirk, He He,
- Abstract summary: We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.
We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
- Score: 15.518838657050173
- License:
- Abstract: Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained using binary judgments where annotators select the preferred choice out of pairs of model outputs. In this work, we argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We propose a taxonomy that identifies two dimensions of subjectivity where different users disagree on the preferred output-namely, the Plurality of Responses to Prompts, where prompts allow for multiple correct answers, and the Indistinguishability of Responses, where candidate outputs are paraphrases of each other. We show that reward models correlate weakly with user preferences in these cases. As a first step to address this issue, we introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement. Incorporating these via a margin term as a form of regularization during model training yields predictions that better align with the aggregate user preferences.
Related papers
- Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Active Preference-based Learning for Multi-dimensional Personalization [7.349038301460469]
Large language models (LLMs) have shown remarkable versatility across tasks, but aligning them with individual human preferences remains challenging.
We propose an active preference learning framework that uses binary feedback to estimate user preferences across multiple objectives.
We validate our approach through theoretical analysis and experiments on language generation tasks, demonstrating its feedback efficiency and effectiveness in personalizing model responses.
arXiv Detail & Related papers (2024-11-01T11:49:33Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
We propose a new axis based on eliciting preferences jointly over instruction-response pairs.
Joint preferences over instruction and response pairs can significantly enhance the alignment of large language models.
arXiv Detail & Related papers (2024-03-31T02:05:40Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Pacos: Modeling Users' Interpretable and Context-Dependent Choices in
Preference Reversals [8.041047797530808]
We identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions.
We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously.
Experimental results show that the proposed method has better performance than prior works in predicting users' choices.
arXiv Detail & Related papers (2023-03-10T01:49:56Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - From Implicit to Explicit feedback: A deep neural network for modeling
sequential behaviours and long-short term preferences of online users [3.464871689508835]
Implicit and explicit feedback have different roles for a useful recommendation.
We go from the hypothesis that a user's preference at a time is a combination of long-term and short-term interests.
arXiv Detail & Related papers (2021-07-26T16:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.