Geometric Features Informed Multi-person Human-object Interaction
Recognition in Videos
- URL: http://arxiv.org/abs/2207.09425v1
- Date: Tue, 19 Jul 2022 17:36:55 GMT
- Title: Geometric Features Informed Multi-person Human-object Interaction
Recognition in Videos
- Authors: Tanqiu Qiao and Qianhui Men and Frederick W. B. Li and Yoshiki
Kubotani and Shigeo Morishima and Hubert P. H. Shum
- Abstract summary: We argue to combine the benefits of both visual and geometric features in HOI recognition.
We propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN)
To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72)
- Score: 19.64072251418535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-Object Interaction (HOI) recognition in videos is important for
analyzing human activity. Most existing work focusing on visual features
usually suffer from occlusion in the real-world scenarios. Such a problem will
be further complicated when multiple people and objects are involved in HOIs.
Consider that geometric features such as human pose and object position provide
meaningful information to understand HOIs, we argue to combine the benefits of
both visual and geometric features in HOI recognition, and propose a novel
Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN). The
geometric-level graph models the interdependency between geometric features of
humans and objects, while the fusion-level graph further fuses them with visual
features of humans and objects. To demonstrate the novelty and effectiveness of
our method in challenging scenarios, we propose a new multi-person HOI dataset
(MPHOI-72). Extensive experiments on MPHOI-72 (multi-person HOI), CAD-120
(single-human HOI) and Bimanual Actions (two-hand HOI) datasets demonstrate our
superior performance compared to state-of-the-arts.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos [9.159660801125812]
Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects.
In this work, we propose a novel end-to-end category to scenery framework, CATS.
We construct a scenery interactive graph with these enhanced geometric-visual features as nodes to learn the relationships among human and object categories.
arXiv Detail & Related papers (2024-07-01T02:42:55Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection [14.900704382194013]
We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
arXiv Detail & Related papers (2022-07-11T15:20:18Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z) - LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal
Networks for HOI in videos [13.25502885135043]
Analyzing the interactions between humans and objects from a video includes identification of relationships between humans and the objects present in the video.
We present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture truth at multiple granularities in a video.
We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO.
arXiv Detail & Related papers (2020-12-17T05:44:07Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.