Visual-Semantic Graph Attention Networks for Human-Object Interaction
Detection
- URL: http://arxiv.org/abs/2001.02302v6
- Date: Sat, 6 Mar 2021 05:42:22 GMT
- Title: Visual-Semantic Graph Attention Networks for Human-Object Interaction
Detection
- Authors: Zhijun Liang, Juan Rojas, Junfa Liu, Yisheng Guan
- Abstract summary: Human-Object Interaction (HOI) Detection infers the action predicate on a human, predicate, object> triplet.
We study the disambiguating contribution of subsidiary relations made available via graph networks.
We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information.
- Score: 6.161066669674775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In scene understanding, robotics benefit from not only detecting individual
scene instances but also from learning their possible interactions.
Human-Object Interaction (HOI) Detection infers the action predicate on a
<human, predicate, object> triplet. Contextual information has been found
critical in inferring interactions. However, most works only use local features
from single human-object pair for inference. Few works have studied the
disambiguating contribution of subsidiary relations made available via graph
networks. Similarly, few have learned to effectively leverage visual cues along
with the intrinsic semantic regularities contained in HOIs. We contribute a
dual-graph attention network that effectively aggregates contextual visual,
spatial, and semantic information dynamically from primary human-object
relations as well as subsidiary relations through attention mechanisms for
strong disambiguating power. We achieve comparable results on two benchmarks:
V-COCO and HICO-DET. Code is available at
\url{https://github.com/birlrobotics/vs-gats}.
Related papers
- Knowledge Guided Bidirectional Attention Network for Human-Object
Interaction Detection [3.0915392100355192]
We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention.
We introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process.
We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model.
arXiv Detail & Related papers (2022-07-16T16:42:49Z) - Interactiveness Field in Human-Object Interactions [89.13149887013905]
We introduce a previously overlooked interactiveness bimodal prior: given an object in an image, after pairing it with the humans, the generated pairs are either mostly non-interactive, or mostly interactive.
We propose new energy constraints based on the cardinality and difference in the inherent "interactiveness field" underlying interactive versus non-interactive pairs.
Our method can detect more precise pairs and thus significantly boost HOI detection performance.
arXiv Detail & Related papers (2022-04-16T05:09:25Z) - Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O)
In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction.
We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.