Promptable Behaviors: Personalizing Multi-Objective Rewards from Human
Preferences
- URL: http://arxiv.org/abs/2312.09337v1
- Date: Thu, 14 Dec 2023 21:00:56 GMT
- Title: Promptable Behaviors: Personalizing Multi-Objective Rewards from Human
Preferences
- Authors: Minyoung Hwang, Luca Weihs, Chanwoo Park, Kimin Lee, Aniruddha
Kembhavi, Kiana Ehsani
- Abstract summary: We present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences.
We introduce three distinct methods to infer human preferences by leveraging different types of interactions.
We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR.
- Score: 53.353022588751585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Customizing robotic behaviors to be aligned with diverse human preferences is
an underexplored challenge in the field of embodied AI. In this paper, we
present Promptable Behaviors, a novel framework that facilitates efficient
personalization of robotic agents to diverse human preferences in complex
environments. We use multi-objective reinforcement learning to train a single
policy adaptable to a broad spectrum of preferences. We introduce three
distinct methods to infer human preferences by leveraging different types of
interactions: (1) human demonstrations, (2) preference feedback on trajectory
comparisons, and (3) language instructions. We evaluate the proposed method in
personalized object-goal navigation and flee navigation tasks in ProcTHOR and
RoboTHOR, demonstrating the ability to prompt agent behaviors to satisfy human
preferences in various scenarios. Project page:
https://promptable-behaviors.github.io
Related papers
- PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories [3.0102456679931944]
This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring preferences.
We evaluate PREDICT on two distinct environments: a gridworld setting and a new text-domain environment.
arXiv Detail & Related papers (2024-10-08T18:16:41Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z) - Everyone Deserves A Reward: Learning Customized Human Preferences [25.28261194665836]
Reward models (RMs) are essential for aligning large language models with human preferences to improve interaction quality.
We propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set.
We find several ways to better preserve the general preferring ability while training the customized RMs.
arXiv Detail & Related papers (2023-09-06T16:03:59Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Preference Transformer: Modeling Human Preferences using Transformers
for RL [165.33887165572128]
Preference Transformer is a neural architecture that models human preferences using transformers.
We show that Preference Transformer can solve a variety of control tasks using real human preferences, while prior approaches fail to work.
arXiv Detail & Related papers (2023-03-02T04:24:29Z) - iCub! Do you recognize what I am doing?: multimodal human action
recognition on multisensory-enabled iCub robot [0.0]
We show that the proposed multimodal ensemble learning leverages complementary characteristics of three color cameras and one depth sensor.
The results indicate that the proposed models can be deployed on the iCub robot that requires multimodal action recognition.
arXiv Detail & Related papers (2022-12-17T12:40:54Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Relative Behavioral Attributes: Filling the Gap between Symbolic Goal
Specification and Reward Learning from Human Preferences [19.70421486855437]
Non-expert users can express complex objectives by expressing preferences over short clips of agent behaviors.
Relative Behavioral Attributes acts as a middle ground between exact goal specification and reward learning purely from preference labels.
We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips.
arXiv Detail & Related papers (2022-10-28T05:25:23Z) - Learning from Physical Human Feedback: An Object-Centric One-Shot
Adaptation Method [5.906020149230538]
Object Preference Adaptation (OPA) is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating according to human feedback.
Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training.
trained on cheap synthetic data instead of expensive human demonstrations, our policy correctly adapts to human perturbations on realistic tasks on a physical 7DOF robot.
arXiv Detail & Related papers (2022-03-09T18:52:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.