A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection
- URL: http://arxiv.org/abs/2207.05733v1
- Date: Mon, 11 Jul 2022 15:20:18 GMT
- Title: A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection
- Authors: Manli Zhu, Edmond S. L. Ho and Hubert P. H. Shum
- Abstract summary: We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
- Score: 14.900704382194013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting human-object interactions is essential for comprehensive
understanding of visual scenes. In particular, spatial connections between
humans and objects are important cues for reasoning interactions. To this end,
we propose a skeleton-aware graph convolutional network for human-object
interaction detection, named SGCN4HOI. Our network exploits the spatial
connections between human keypoints and object keypoints to capture their
fine-grained structural interactions via graph convolutions. It fuses such
geometric features with visual features and spatial configuration features
obtained from human-object pairs. Furthermore, to better preserve the object
structural information and facilitate human-object interaction detection, we
propose a novel skeleton-based object keypoints representation. The performance
of SGCN4HOI is evaluated in the public benchmark V-COCO dataset. Experimental
results show that the proposed approach outperforms the state-of-the-art
pose-based models and achieves competitive performance against other models.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos [9.159660801125812]
Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects.
In this work, we propose a novel end-to-end category to scenery framework, CATS.
We construct a scenery interactive graph with these enhanced geometric-visual features as nodes to learn the relationships among human and object categories.
arXiv Detail & Related papers (2024-07-01T02:42:55Z) - HOKEM: Human and Object Keypoint-based Extension Module for Human-Object
Interaction Detection [1.2183405753834557]
This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models.
Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.
arXiv Detail & Related papers (2023-06-25T14:40:26Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z) - VSGNet: Spatial Attention Network for Detecting Human Object
Interactions Using Graph Convolutions [13.83595180218225]
Relative spatial reasoning and structural connections between objects are essential cues for analyzing interactions.
Proposed Visual-Spatial-Graph Network (VSGNet) architecture extracts visual features from human-object pairs.
VSGNet outperforms state-of-the-art solutions by 8% or 4 mAP in V-COCO and 16% or 3 mAP in HICO-DET.
arXiv Detail & Related papers (2020-03-11T22:23:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.