IGFormer: Interaction Graph Transformer for Skeleton-based Human
Interaction Recognition
- URL: http://arxiv.org/abs/2207.12100v1
- Date: Mon, 25 Jul 2022 12:11:15 GMT
- Title: IGFormer: Interaction Graph Transformer for Skeleton-based Human
Interaction Recognition
- Authors: Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu
- Abstract summary: We propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition.
IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts.
We also propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence.
- Score: 26.05948629634753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human interaction recognition is very important in many applications. One
crucial cue in recognizing an interaction is the interactive body parts. In
this work, we propose a novel Interaction Graph Transformer (IGFormer) network
for skeleton-based interaction recognition via modeling the interactive body
parts as graphs. More specifically, the proposed IGFormer constructs
interaction graphs according to the semantic and distance correlations between
the interactive body parts, and enhances the representation of each person by
aggregating the information of the interactive body parts based on the learned
graphs. Furthermore, we propose a Semantic Partition Module to transform each
human skeleton sequence into a Body-Part-Time sequence to better capture the
spatial and temporal information of the skeleton sequence for learning the
graphs. Extensive experiments on three benchmark datasets demonstrate that our
model outperforms the state-of-the-art with a significant margin.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network [2.223052975765005]
We propose a novel Pyramid Graph Convolutional Network (PGCN) to automatically recognize human-object interaction.
The system represents the 2D or 3D spatial relation of human and objects from the detection results in video data as a graph.
We evaluate our model on two challenging datasets in the field of human-object interaction recognition.
arXiv Detail & Related papers (2024-10-10T13:39:17Z) - Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition [22.538114033191313]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - Inter-X: Towards Versatile Human-Human Interaction Analysis [100.254438708001]
We propose Inter-X, a dataset with accurate body movements and diverse interaction patterns.
The dataset includes 11K interaction sequences and more than 8.1M frames.
We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions.
arXiv Detail & Related papers (2023-12-26T13:36:05Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Cross-Skeleton Interaction Graph Aggregation Network for Representation
Learning of Mouse Social Behaviour [24.716092330419123]
Social behaviour analysis of mice has become an increasingly popular research area in behavioural neuroscience.
It is challenging to model complex social interactions between mice due to highly deformable body shapes and ambiguous movement patterns.
We propose a Cross-Skeleton Interaction Graph Aggregation Network (CS-IGANet) to learn abundant dynamics of freely interacting mice.
arXiv Detail & Related papers (2022-08-07T21:06:42Z) - A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection [14.900704382194013]
We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
arXiv Detail & Related papers (2022-07-11T15:20:18Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z) - Multi-Level Graph Encoding with Structural-Collaborative Relation
Learning for Skeleton-Based Person Re-Identification [11.303008512400893]
Skeleton-based person re-identification (Re-ID) is an emerging open topic providing great value for safety-critical applications.
Existing methods typically extract hand-crafted features or model skeleton dynamics from the trajectory of body joints.
We propose a Multi-level Graph encoding approach with Structural-Collaborative Relation learning (MG-SCR) to encode discriminative graph features for person Re-ID.
arXiv Detail & Related papers (2021-06-06T09:09:57Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.