Exploiting Scene Graphs for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2108.08584v1
- Date: Thu, 19 Aug 2021 09:40:50 GMT
- Title: Exploiting Scene Graphs for Human-Object Interaction Detection
- Authors: Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
- Abstract summary: Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
- Score: 81.49184987430333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-Object Interaction (HOI) detection is a fundamental visual task aiming
at localizing and recognizing interactions between humans and objects. Existing
works focus on the visual and linguistic features of humans and objects.
However, they do not capitalise on the high-level and semantic relationships
present in the image, which provides crucial contextual and detailed relational
knowledge for HOI inference. We propose a novel method to exploit this
information, through the scene graph, for the Human-Object Interaction (SG2HOI)
detection task. Our method, SG2HOI, incorporates the SG information in two
ways: (1) we embed a scene graph into a global context clue, serving as the
scene-specific environmental context; and (2) we build a relation-aware
message-passing module to gather relationships from objects' neighborhood and
transfer them into interactions. Empirical evaluation shows that our SG2HOI
method outperforms the state-of-the-art methods on two benchmark HOI datasets:
V-COCO and HICO-DET. Code will be available at https://github.com/ht014/SG2HOI.
Related papers
- Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO [29.0200561485714]
We propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O)
In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction.
We propose DIABOLO, an efficient subject-centric single-shot method to detect all interactions in one forward pass.
arXiv Detail & Related papers (2022-01-07T11:00:11Z) - GTNet:Guided Transformer Network for Detecting Human-Object Interactions [10.809778265707916]
The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair.
For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images.
This issue is addressed by the novel self-attention based guided transformer network, GTNet.
arXiv Detail & Related papers (2021-08-02T02:06:33Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - GID-Net: Detecting Human-Object Interaction with Global and Instance
Dependency [67.95192190179975]
We introduce a two-stage trainable reasoning mechanism, referred to as GID block.
GID-Net is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch.
We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-11T11:58:43Z) - Visual-Semantic Graph Attention Networks for Human-Object Interaction
Detection [6.161066669674775]
Human-Object Interaction (HOI) Detection infers the action predicate on a human, predicate, object> triplet.
We study the disambiguating contribution of subsidiary relations made available via graph networks.
We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information.
arXiv Detail & Related papers (2020-01-07T22:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.