Learning Intuitive Policies Using Action Features
- URL: http://arxiv.org/abs/2201.12658v2
- Date: Tue, 6 Jun 2023 01:35:13 GMT
- Title: Learning Intuitive Policies Using Action Features
- Authors: Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob
Foerster
- Abstract summary: We investigate the effect of network architecture on the propensity of learning algorithms to exploit semantic relationships.
We find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies.
- Score: 7.260481131198059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An unaddressed challenge in multi-agent coordination is to enable AI agents
to exploit the semantic relationships between the features of actions and the
features of observations. Humans take advantage of these relationships in
highly intuitive ways. For instance, in the absence of a shared language, we
might point to the object we desire or hold up our fingers to indicate how many
objects we want. To address this challenge, we investigate the effect of
network architecture on the propensity of learning algorithms to exploit these
semantic relationships. Across a procedurally generated coordination task, we
find that attention-based architectures that jointly process a featurized
representation of observations and actions have a better inductive bias for
learning intuitive policies. Through fine-grained evaluation and scenario
analysis, we show that the resulting policies are human-interpretable.
Moreover, such agents coordinate with people without training on any human
data.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Compositional Learning in Transformer-Based Human-Object Interaction
Detection [6.630793383852106]
Long-tailed distribution of labeled instances is a primary challenge in HOI detection.
Inspired by the nature of HOI triplets, some existing approaches adopt the idea of compositional learning.
We creatively propose a transformer-based framework for compositional HOI learning.
arXiv Detail & Related papers (2023-08-11T06:41:20Z) - Mining Conditional Part Semantics with Occluded Extrapolation for
Human-Object Interaction Detection [16.9278983497498]
Human-Object Interaction Detection is a crucial aspect of human-centric scene understanding.
Existing methods try to use human-related clues to alleviate the difficulty, but rely heavily on external annotations or knowledge.
We propose a novel Part Semantic Network (PSN) to solve this problem.
arXiv Detail & Related papers (2023-07-19T23:55:15Z) - Knowledge Guided Bidirectional Attention Network for Human-Object
Interaction Detection [3.0915392100355192]
We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention.
We introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process.
We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model.
arXiv Detail & Related papers (2022-07-16T16:42:49Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Domain-Robust Visual Imitation Learning with Mutual Information
Constraints [0.0]
We introduce a new algorithm called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL)
Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task.
arXiv Detail & Related papers (2021-03-08T21:18:58Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.