Relative Behavioral Attributes: Filling the Gap between Symbolic Goal
Specification and Reward Learning from Human Preferences
- URL: http://arxiv.org/abs/2210.15906v1
- Date: Fri, 28 Oct 2022 05:25:23 GMT
- Title: Relative Behavioral Attributes: Filling the Gap between Symbolic Goal
Specification and Reward Learning from Human Preferences
- Authors: Lin Guan, Karthik Valmeekam, Subbarao Kambhampati
- Abstract summary: Non-expert users can express complex objectives by expressing preferences over short clips of agent behaviors.
Relative Behavioral Attributes acts as a middle ground between exact goal specification and reward learning purely from preference labels.
We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips.
- Score: 19.70421486855437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating complex behaviors from goals specified by non-expert users is a
crucial aspect of intelligent agents. Interactive reward learning from
trajectory comparisons is one way to allow non-expert users to convey complex
objectives by expressing preferences over short clips of agent behaviors. Even
though this method can encode complex tacit knowledge present in the underlying
tasks, it implicitly assumes that the human is unable to provide rich-form
feedback other than binary preference labels, leading to extremely high
feedback complexity and poor user experience. While providing a detailed
symbolic specification of the objectives might be tempting, it is not always
feasible even for an expert user. However, in most cases, humans are aware of
how the agent should change its behavior along meaningful axes to fulfill the
underlying purpose, even if they are not able to fully specify task objectives
symbolically. Using this as motivation, we introduce the notion of Relative
Behavioral Attributes, which acts as a middle ground, between exact goal
specification and reward learning purely from preference labels, by enabling
the users to tweak the agent's behavior through nameable concepts (e.g.,
increasing the softness of the movement of a two-legged "sneaky" agent). We
propose two different parametric methods that can potentially encode any kind
of behavioral attributes from ordered behavior clips. We demonstrate the
effectiveness of our methods on 4 tasks with 9 different behavioral attributes
and show that once the attributes are learned, end users can effortlessly
produce desirable agent behaviors, by providing feedback just around 10 times.
The feedback complexity of our approach is over 10 times less than the
learning-from-human-preferences baseline and this demonstrates that our
approach is readily applicable in real-world applications.
Related papers
- Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input [17.131441665935128]
We study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models.
Our findings suggest that incorporating pragmatic feature preferences is a promising approach for more efficient user-aligned reward learning.
arXiv Detail & Related papers (2024-05-23T16:36:16Z) - Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering [9.740376003100437]
We propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF)
Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent.
To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy.
arXiv Detail & Related papers (2024-05-15T02:31:26Z) - Select to Perfect: Imitating desired behavior from large multi-agent data [28.145889065013687]
Desired characteristics for AI agents can be expressed by assigning desirability scores.
We first assess the effect of each individual agent's behavior on the collective desirability score.
We propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score.
arXiv Detail & Related papers (2024-05-06T15:48:24Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - Promptable Behaviors: Personalizing Multi-Objective Rewards from Human
Preferences [53.353022588751585]
We present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences.
We introduce three distinct methods to infer human preferences by leveraging different types of interactions.
We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR.
arXiv Detail & Related papers (2023-12-14T21:00:56Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z) - Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for
Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments.
Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation.
We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z) - Towards customizable reinforcement learning agents: Enabling preference
specification through online vocabulary expansion [25.053927377536905]
We propose PRESCA, a system that allows users to specify their preferences in terms of concepts that they understand.
We evaluate PRESCA by using it on a Minecraft environment and show that it can be effectively used to make the agent align with the user's preference.
arXiv Detail & Related papers (2022-10-27T00:54:14Z) - Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation [61.114580368455236]
User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems.
We propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user.
Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors.
arXiv Detail & Related papers (2021-09-07T04:28:09Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.