Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots
- URL: http://arxiv.org/abs/2411.11182v1
- Date: Sun, 17 Nov 2024 21:52:58 GMT
- Title: Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots
- Authors: Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, Maja Matarić,
- Abstract summary: We show that CMA-ES-IG prioritizes the user's experience of the preference learning process.
We show that users find our algorithm more intuitive than previous approaches across both physical and social robot tasks.
- Score: 5.523009758632668
- License:
- Abstract: Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG
Related papers
- MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z) - Integrating Human Expertise in Continuous Spaces: A Novel Interactive
Bayesian Optimization Framework with Preference Expected Improvement [0.5148939336441986]
Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes.
We propose a novel framework based on Bayesian Optimization (BO)
BO enables collaboration between machine learning algorithms and humans.
arXiv Detail & Related papers (2024-01-23T11:14:59Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - Optimizing Algorithms From Pairwise User Preferences [23.87058308494074]
We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences.
We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
arXiv Detail & Related papers (2023-08-08T20:36:59Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process.
We propose to generate smooth motions via an efficient model-predictive control framework.
We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z) - Learning Reward Functions from Scale Feedback [11.941038991430837]
A common framework is to iteratively query the user about which of two presented robot trajectories they prefer.
We propose scale feedback, where the user utilizes a slider to give more nuanced information.
We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies.
arXiv Detail & Related papers (2021-10-01T09:45:18Z) - Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion.
A form of population based training is used to orchestrate this self-training.
We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Active Preference Learning using Maximum Regret [10.317601896290467]
We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots.
In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences.
arXiv Detail & Related papers (2020-05-08T14:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.