Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback
- URL: http://arxiv.org/abs/2302.03805v2
- Date: Wed, 1 Nov 2023 03:06:11 GMT
- Title: Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback
- Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha,
Matthew R. Walter
- Abstract summary: We propose a multi-objective decision making framework that accommodates different user preferences over objectives.
Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector.
We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
- Score: 76.7007545844273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In classic reinforcement learning (RL) and decision making problems, policies
are evaluated with respect to a scalar reward function, and all optimal
policies are the same with regards to their expected return. However, many
real-world problems involve balancing multiple, sometimes conflicting,
objectives whose relative priority will vary according to the preferences of
each user. Consequently, a policy that is optimal for one user might be
sub-optimal for another. In this work, we propose a multi-objective decision
making framework that accommodates different user preferences over objectives,
where preferences are learned via policy comparisons. Our model consists of a
Markov decision process with a vector-valued reward function, with each user
having an unknown preference vector that expresses the relative importance of
each objective. The goal is to efficiently compute a near-optimal policy for a
given user. We consider two user feedback models. We first address the case
where a user is provided with two policies and returns their preferred policy
as feedback. We then move to a different user feedback model, where a user is
instead provided with two small weighted sets of representative trajectories
and selects the preferred one. In both cases, we suggest an algorithm that
finds a nearly optimal policy for the user using a small number of comparison
queries.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Beyond the Binary: Capturing Diverse Preferences With Reward Regularization [15.518838657050173]
We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.
We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
arXiv Detail & Related papers (2024-12-05T02:35:46Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Human-in-the-Loop Policy Optimization for Preference-Based
Multi-Objective Reinforcement Learning [13.627087954965695]
We propose a human-in-the-loop policy optimization framework for preference-based MORL.
Our method proactively learns the DM's implicit preference information without requiring any priori knowledge.
We evaluate our approach against three conventional MORL algorithms and four state-of-the-art preference-based MORL algorithms.
arXiv Detail & Related papers (2024-01-04T09:17:53Z) - Pacos: Modeling Users' Interpretable and Context-Dependent Choices in
Preference Reversals [8.041047797530808]
We identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions.
We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously.
Experimental results show that the proposed method has better performance than prior works in predicting users' choices.
arXiv Detail & Related papers (2023-03-10T01:49:56Z) - Everyone's Preference Changes Differently: Weighted Multi-Interest
Retrieval Model [18.109035867113217]
Multi-Interest Preference (MIP) model is an approach that produces multi-interest for users by using the user's sequential engagement more effectively.
Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-14T04:29:54Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users.
Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z) - Dynamic-K Recommendation with Personalized Decision Boundary [41.70842736417849]
We develop a dynamic-K recommendation task as a joint learning problem with both ranking and classification objectives.
We extend two state-of-the-art ranking-based recommendation methods, i.e., BPRMF and HRM, to the corresponding dynamic-K versions.
Our experimental results on two datasets show that the dynamic-K models are more effective than the original fixed-N recommendation methods.
arXiv Detail & Related papers (2020-12-25T13:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.