Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback
- URL: http://arxiv.org/abs/2302.03805v2
- Date: Wed, 1 Nov 2023 03:06:11 GMT
- Title: Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback
- Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha,
Matthew R. Walter
- Abstract summary: We propose a multi-objective decision making framework that accommodates different user preferences over objectives.
Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector.
We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
- Score: 76.7007545844273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In classic reinforcement learning (RL) and decision making problems, policies
are evaluated with respect to a scalar reward function, and all optimal
policies are the same with regards to their expected return. However, many
real-world problems involve balancing multiple, sometimes conflicting,
objectives whose relative priority will vary according to the preferences of
each user. Consequently, a policy that is optimal for one user might be
sub-optimal for another. In this work, we propose a multi-objective decision
making framework that accommodates different user preferences over objectives,
where preferences are learned via policy comparisons. Our model consists of a
Markov decision process with a vector-valued reward function, with each user
having an unknown preference vector that expresses the relative importance of
each objective. The goal is to efficiently compute a near-optimal policy for a
given user. We consider two user feedback models. We first address the case
where a user is provided with two policies and returns their preferred policy
as feedback. We then move to a different user feedback model, where a user is
instead provided with two small weighted sets of representative trajectories
and selects the preferred one. In both cases, we suggest an algorithm that
finds a nearly optimal policy for the user using a small number of comparison
queries.
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Human-in-the-Loop Policy Optimization for Preference-Based
Multi-Objective Reinforcement Learning [13.627087954965695]
We propose a human-in-the-loop policy optimization framework for preference-based MORL.
Our method proactively learns the DM's implicit preference information without requiring any priori knowledge.
We evaluate our approach against three conventional MORL algorithms and four state-of-the-art preference-based MORL algorithms.
arXiv Detail & Related papers (2024-01-04T09:17:53Z) - Pacos: Modeling Users' Interpretable and Context-Dependent Choices in
Preference Reversals [8.041047797530808]
We identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions.
We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously.
Experimental results show that the proposed method has better performance than prior works in predicting users' choices.
arXiv Detail & Related papers (2023-03-10T01:49:56Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Everyone's Preference Changes Differently: Weighted Multi-Interest
Retrieval Model [18.109035867113217]
Multi-Interest Preference (MIP) model is an approach that produces multi-interest for users by using the user's sequential engagement more effectively.
Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-14T04:29:54Z) - Modeling Dynamic User Preference via Dictionary Learning for Sequential
Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time.
Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently.
This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z) - IMO$^3$: Interactive Multi-Objective Off-Policy Optimization [45.2918894257473]
A system designer needs to find a policy that trades off objectives to reach a desired operating point.
We propose interactive multi-objective off-policy optimization (IMO$3$)
We show that IMO$3$ identifies a near-optimal policy with high probability.
arXiv Detail & Related papers (2022-01-24T16:51:41Z) - Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users.
Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z) - Dynamic-K Recommendation with Personalized Decision Boundary [41.70842736417849]
We develop a dynamic-K recommendation task as a joint learning problem with both ranking and classification objectives.
We extend two state-of-the-art ranking-based recommendation methods, i.e., BPRMF and HRM, to the corresponding dynamic-K versions.
Our experimental results on two datasets show that the dynamic-K models are more effective than the original fixed-N recommendation methods.
arXiv Detail & Related papers (2020-12-25T13:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.