Learning Reward Functions from Scale Feedback
- URL: http://arxiv.org/abs/2110.00284v1
- Date: Fri, 1 Oct 2021 09:45:18 GMT
- Title: Learning Reward Functions from Scale Feedback
- Authors: Nils Wilde, Erdem B{\i}y{\i}k, Dorsa Sadigh, Stephen L. Smith
- Abstract summary: A common framework is to iteratively query the user about which of two presented robot trajectories they prefer.
We propose scale feedback, where the user utilizes a slider to give more nuanced information.
We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies.
- Score: 11.941038991430837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's robots are increasingly interacting with people and need to
efficiently learn inexperienced user's preferences. A common framework is to
iteratively query the user about which of two presented robot trajectories they
prefer. While this minimizes the users effort, a strict choice does not yield
any information on how much one trajectory is preferred. We propose scale
feedback, where the user utilizes a slider to give more nuanced information. We
introduce a probabilistic model on how users would provide feedback and derive
a learning framework for the robot. We demonstrate the performance benefit of
slider feedback in simulations, and validate our approach in two user studies
suggesting that scale feedback enables more effective learning in practice.
Related papers
- Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration [64.6107798750142]
Vocal Sandbox is a framework for enabling seamless human-robot collaboration in situated environments.
We design lightweight and interpretable learning algorithms that allow users to build an understanding and co-adapt to a robot's capabilities in real-time.
We evaluate Vocal Sandbox in two settings: collaborative gift bag assembly and LEGO stop-motion animation.
arXiv Detail & Related papers (2024-11-04T20:44:40Z) - Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods [26.55942230051388]
We evaluate interactive segmentation models through either real user studies or simulated annotators.
Real user studies are expensive and often limited in scale, while simulated annotators, also known as robot users, tend to overestimate model performance.
We propose a more realistic robot user that reduces the user shift by incorporating human factors such as click variation and inter-annotator disagreement.
arXiv Detail & Related papers (2024-04-02T10:19:17Z) - PREDILECT: Preferences Delineated with Zero-Shot Language-based
Reasoning in Reinforcement Learning [2.7387720378113554]
Preference-based reinforcement learning (RL) has emerged as a new field in robot learning.
We use the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans.
In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications.
arXiv Detail & Related papers (2024-02-23T16:30:05Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - RESUS: Warm-Up Cold Users via Meta-Learning Residual User Preferences in
CTR Prediction [14.807495564177252]
Click-Through Rate (CTR) prediction on cold users is a challenging task in recommender systems.
We propose a novel and efficient approach named RESUS, which decouples the learning of global preference knowledge contributed by collective users from the learning of residual preferences for individual users.
Our approach is efficient and effective in improving CTR prediction accuracy on cold users, compared with various state-of-the-art methods.
arXiv Detail & Related papers (2022-10-28T11:57:58Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z) - Active Preference Learning using Maximum Regret [10.317601896290467]
We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots.
In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences.
arXiv Detail & Related papers (2020-05-08T14:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.