Related papers: Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

URL: http://arxiv.org/abs/2302.03805v2
Date: Wed, 1 Nov 2023 03:06:11 GMT
Title: Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback
Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew R. Walter
Abstract summary: We propose a multi-objective decision making framework that accommodates different user preferences over objectives. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector. We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
Score: 76.7007545844273
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In classic reinforcement learning (RL) and decision making problems, policies are evaluated with respect to a scalar reward function, and all optimal policies are the same with regards to their expected return. However, many real-world problems involve balancing multiple, sometimes conflicting, objectives whose relative priority will vary according to the preferences of each user. Consequently, a policy that is optimal for one user might be sub-optimal for another. In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.

Related papers

Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization [15.518838657050173]
We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
arXiv Detail & Related papers (2024-12-05T02:35:46Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning [13.627087954965695]
We propose a human-in-the-loop policy optimization framework for preference-based MORL. Our method proactively learns the DM's implicit preference information without requiring any priori knowledge. We evaluate our approach against three conventional MORL algorithms and four state-of-the-art preference-based MORL algorithms.
arXiv Detail & Related papers (2024-01-04T09:17:53Z)
Vague Preference Policy Learning for Conversational Recommendation [48.868921530958666]
Conversational recommendation systems commonly assume users have clear preferences, leading to potential over-filtering. We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences. Our work advances CRS by accommodating users' inherent ambiguity and relative decision-making processes, improving real-world applicability.
arXiv Detail & Related papers (2023-06-07T14:57:21Z)
Pacos: Modeling Users' Interpretable and Context-Dependent Choices in Preference Reversals [8.041047797530808]
We identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions. We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously. Experimental results show that the proposed method has better performance than prior works in predicting users' choices.
arXiv Detail & Related papers (2023-03-10T01:49:56Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
Everyone's Preference Changes Differently: Weighted Multi-Interest Retrieval Model [18.109035867113217]
Multi-Interest Preference (MIP) model is an approach that produces multi-interest for users by using the user's sequential engagement more effectively. Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-14T04:29:54Z)
Modeling Dynamic User Preference via Dictionary Learning for Sequential Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time. Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently. This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z)
IMO$^3$: Interactive Multi-Objective Off-Policy Optimization [45.2918894257473]
A system designer needs to find a policy that trades off objectives to reach a desired operating point. We propose interactive multi-objective off-policy optimization (IMO$3$) We show that IMO$3$ identifies a near-optimal policy with high probability.
arXiv Detail & Related papers (2022-01-24T16:51:41Z)
Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users. Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z)
Dynamic-K Recommendation with Personalized Decision Boundary [41.70842736417849]
We develop a dynamic-K recommendation task as a joint learning problem with both ranking and classification objectives. We extend two state-of-the-art ranking-based recommendation methods, i.e., BPRMF and HRM, to the corresponding dynamic-K versions. Our experimental results on two datasets show that the dynamic-K models are more effective than the original fixed-N recommendation methods.
arXiv Detail & Related papers (2020-12-25T13:02:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.