Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
- URL: http://arxiv.org/abs/2412.03822v1
- Date: Thu, 05 Dec 2024 02:35:46 GMT
- Title: Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
- Authors: Vishakh Padmakumar, Chuanyang Jin, Hannah Rose Kirk, He He,
- Abstract summary: We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.<n>We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
- Score: 15.518838657050173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained using binary judgments where annotators select the preferred choice out of pairs of model outputs. In this work, we argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We propose a taxonomy that identifies two dimensions of subjectivity where different users disagree on the preferred output-namely, the Plurality of Responses to Prompts, where prompts allow for multiple correct answers, and the Indistinguishability of Responses, where candidate outputs are paraphrases of each other. We show that reward models correlate weakly with user preferences in these cases. As a first step to address this issue, we introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement. Incorporating these via a margin term as a form of regularization during model training yields predictions that better align with the aggregate user preferences.
Related papers
- HyPerAlign: Hypotheses-driven Personalized Alignment [24.67727411391369]
We propose a hypotheses-driven personalization approach (HyPerAlign) for large language models (LLMs)
For deliberative alignment, the helpfulness of LLM models is improved by up to $70%$ on average.
For authorship attribution, results indicate consistently high win-rates (commonly $>90%$) against state-of-the-art preference fine-tuning approaches.
arXiv Detail & Related papers (2025-04-29T18:01:46Z) - AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models [10.52052172996229]
AdaptRec is a self-adaptive fram-ework that leverages Large Language Models for sequential recommendations by incorporating explicit collaborative signals.
We develop a User-Contextualized Recommendation Prompt that translates their behavior sequences into natural language, explicitly integrating this information into the recommendation process.
Experiments demonstrate AdaptRec's superior performance, with significant improvements in HitRatio@1 scores of 7.13%, 18.16%, and 10.41% across real-world datasets.
arXiv Detail & Related papers (2025-04-06T00:30:50Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
A common technique for aligning large language models (LLMs) relies on acquiring human preferences.
We propose a new axis that is based on eliciting preferences jointly over the instruction-response pairs.
We find that joint preferences over instruction and response pairs can significantly enhance the alignment of LLMs.
arXiv Detail & Related papers (2024-03-31T02:05:40Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Pacos: Modeling Users' Interpretable and Context-Dependent Choices in
Preference Reversals [8.041047797530808]
We identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions.
We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously.
Experimental results show that the proposed method has better performance than prior works in predicting users' choices.
arXiv Detail & Related papers (2023-03-10T01:49:56Z) - Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback [76.7007545844273]
We propose a multi-objective decision making framework that accommodates different user preferences over objectives.
Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector.
We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
arXiv Detail & Related papers (2023-02-07T23:58:19Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - From Implicit to Explicit feedback: A deep neural network for modeling
sequential behaviours and long-short term preferences of online users [3.464871689508835]
Implicit and explicit feedback have different roles for a useful recommendation.
We go from the hypothesis that a user's preference at a time is a combination of long-term and short-term interests.
arXiv Detail & Related papers (2021-07-26T16:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.