ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection
- URL: http://arxiv.org/abs/2008.06254v4
- Date: Sun, 27 Mar 2022 07:49:43 GMT
- Title: ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection
- Authors: Ye Liu, Junsong Yuan, Chang Wen Chen
- Abstract summary: We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
- Score: 101.56529337489417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of Human-Object Interaction (HOI) Detection, which
aims to locate and recognize HOI instances in the form of <human, action,
object> in images. Most existing works treat HOIs as individual interaction
categories, thus can not handle the problem of long-tail distribution and
polysemy of action labels. We argue that multi-level consistencies among
objects, actions and interactions are strong cues for generating semantic
representations of rare or previously unseen HOIs. Leveraging the compositional
and relational peculiarities of HOI labels, we propose ConsNet, a
knowledge-aware framework that explicitly encodes the relations among objects,
actions and interactions into an undirected graph called consistency graph, and
exploits Graph Attention Networks (GATs) to propagate knowledge among HOI
categories as well as their constituents. Our model takes visual features of
candidate human-object pairs and word embeddings of HOI labels as inputs, maps
them into visual-semantic joint embedding space and obtains detection results
by measuring their similarities. We extensively evaluate our model on the
challenging V-COCO and HICO-DET datasets, and results validate that our
approach outperforms state-of-the-arts under both fully-supervised and
zero-shot settings. Code is available at https://github.com/yeliudev/ConsNet.
Related papers
- Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task.
Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z) - Zero-Shot Human-Object Interaction Recognition via Affordance Graphs [3.867143522757309]
We propose a new approach for Zero-Shot Human-Object Interaction Recognition.
Our approach makes use of knowledge external to the image content in the form of a graph.
We evaluate our approach on several datasets and show that it outperforms the current state of the art.
arXiv Detail & Related papers (2020-09-02T13:14:44Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - Visual-Semantic Graph Attention Networks for Human-Object Interaction
Detection [6.161066669674775]
Human-Object Interaction (HOI) Detection infers the action predicate on a human, predicate, object> triplet.
We study the disambiguating contribution of subsidiary relations made available via graph networks.
We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information.
arXiv Detail & Related papers (2020-01-07T22:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.