Related papers: Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

URL: http://arxiv.org/abs/2411.11182v1
Date: Sun, 17 Nov 2024 21:52:58 GMT
Title: Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots
Authors: Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, Maja Matarić,
Abstract summary: We show that CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive than previous approaches across both physical and social robot tasks.
Score: 5.523009758632668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG

Related papers

Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z)
Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning [25.841585208296998]
Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained Language-Guided Preference Learning (LGPL) Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations.
arXiv Detail & Related papers (2025-02-06T02:07:18Z)
Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation [6.033491390990401]
We propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.
arXiv Detail & Related papers (2025-01-02T17:26:01Z)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks. Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z)
Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement [0.5148939336441986]
Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. We propose a novel framework based on Bayesian Optimization (BO) BO enables collaboration between machine learning algorithms and humans.
arXiv Detail & Related papers (2024-01-23T11:14:59Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
What Matters to You? Towards Visual Representation Alignment for Robot Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences. We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z)
Optimizing Algorithms From Pairwise User Preferences [23.87058308494074]
We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
arXiv Detail & Related papers (2023-08-08T20:36:59Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process. We propose to generate smooth motions via an efficient model-predictive control framework. We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z)
Learning Reward Functions from Scale Feedback [11.941038991430837]
A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies.
arXiv Detail & Related papers (2021-10-01T09:45:18Z)
Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion. A form of population based training is used to orchestrate this self-training. We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
Active Preference Learning using Maximum Regret [10.317601896290467]
We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots. In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences.
arXiv Detail & Related papers (2020-05-08T14:31:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.