Optimizing Algorithms From Pairwise User Preferences
- URL: http://arxiv.org/abs/2308.04571v1
- Date: Tue, 8 Aug 2023 20:36:59 GMT
- Title: Optimizing Algorithms From Pairwise User Preferences
- Authors: Leonid Keselman, Katherine Shih, Martial Hebert, Aaron Steinfeld
- Abstract summary: We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences.
We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
- Score: 23.87058308494074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typical black-box optimization approaches in robotics focus on learning from
metric scores. However, that is not always possible, as not all developers have
ground truth available. Learning appropriate robot behavior in human-centric
contexts often requires querying users, who typically cannot provide precise
metric scores. Existing approaches leverage human feedback in an attempt to
model an implicit reward function; however, this reward may be difficult or
impossible to effectively capture. In this work, we introduce SortCMA to
optimize algorithm parameter configurations in high dimensions based on
pairwise user preferences. SortCMA efficiently and robustly leverages user
input to find parameter sets without directly modeling a reward. We apply this
method to tuning a commercial depth sensor without ground truth, and to robot
social navigation, which involves highly complex preferences over robot
behavior. We show that our method succeeds in optimizing for the user's goals
and perform a user study to evaluate social navigation results.
Related papers
- Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots [5.523009758632668]
We show that CMA-ES-IG prioritizes the user's experience of the preference learning process.
We show that users find our algorithm more intuitive than previous approaches across both physical and social robot tasks.
arXiv Detail & Related papers (2024-11-17T21:52:58Z) - Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality.
We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations.
We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Real Evaluations Tractability using Continuous Goal-Directed Actions in
Smart City Applications [3.1158660854608824]
Continuous Goal-Directed Actions (CGDA) encodes actions as changes of any feature that can be extracted from the environment.
Current strategies involve performing evaluations in a simulation, transferring the final joint trajectory to the actual robot.
Two different approaches to reduce the number of evaluations using EA, are proposed and compared.
arXiv Detail & Related papers (2024-02-01T15:38:21Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Active Preference Learning using Maximum Regret [10.317601896290467]
We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots.
In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences.
arXiv Detail & Related papers (2020-05-08T14:31:31Z) - Human Strategic Steering Improves Performance of Interactive
Optimization [33.54512897507445]
In recommender systems, the action is to choose what to recommend, and the optimization task is to recommend items the user prefers.
We argue that this fundamental assumption can be extensively violated by human users, who are not passive feedback sources.
We designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function.
arXiv Detail & Related papers (2020-05-04T06:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.