Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots
- URL: http://arxiv.org/abs/2411.11182v1
- Date: Sun, 17 Nov 2024 21:52:58 GMT
- Title: Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots
- Authors: Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, Maja Matarić,
- Abstract summary: We show that CMA-ES-IG prioritizes the user's experience of the preference learning process.
We show that users find our algorithm more intuitive than previous approaches across both physical and social robot tasks.
- Score: 5.523009758632668
- License:
- Abstract: Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG
Related papers
- Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning [25.841585208296998]
Expressive robotic behavior is essential for the widespread acceptance of robots in social environments.
Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient.
This paper introduces a novel approach that leverages priors generated by pre-trained Language-Guided Preference Learning (LGPL)
Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations.
arXiv Detail & Related papers (2025-02-06T02:07:18Z) - Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation [6.033491390990401]
We propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about.
CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.
arXiv Detail & Related papers (2025-01-02T17:26:01Z) - Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment [73.14105098897696]
We propose Representation-Aligned Preference-based Learning (RAPL) to learn visual rewards from significantly less human preference feedback.
RAPL focuses on fine-tuning pre-trained vision encoders to align with the end-user's visual representation and then constructs a dense visual reward via feature matching.
We show that RAPL can learn rewards aligned with human preferences, more efficiently uses preference data, and generalizes across robot embodiments.
arXiv Detail & Related papers (2024-12-06T08:04:02Z) - MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z) - Integrating Human Expertise in Continuous Spaces: A Novel Interactive
Bayesian Optimization Framework with Preference Expected Improvement [0.5148939336441986]
Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes.
We propose a novel framework based on Bayesian Optimization (BO)
BO enables collaboration between machine learning algorithms and humans.
arXiv Detail & Related papers (2024-01-23T11:14:59Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - Optimizing Algorithms From Pairwise User Preferences [23.87058308494074]
We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences.
We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
arXiv Detail & Related papers (2023-08-08T20:36:59Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Learning Reward Functions from Scale Feedback [11.941038991430837]
A common framework is to iteratively query the user about which of two presented robot trajectories they prefer.
We propose scale feedback, where the user utilizes a slider to give more nuanced information.
We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies.
arXiv Detail & Related papers (2021-10-01T09:45:18Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.