Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition
- URL: http://arxiv.org/abs/2402.02431v1
- Date: Sun, 4 Feb 2024 10:00:00 GMT
- Title: Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition
- Authors: Mengyuan Liu, Chen Chen, Songtao Wu, Fanyang Meng, Hong Liu
- Abstract summary: We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
- Score: 22.538114033191313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognizing interactive actions, including hand-to-hand interaction and
human-to-human interaction, has attracted increasing attention for various
applications in the field of video analysis and human-robot interaction.
Considering the success of graph convolution in modeling topology-aware
features from skeleton data, recent methods commonly operate graph convolution
on separate entities and use late fusion for interactive action recognition,
which can barely model the mutual semantic relationships between pairwise
entities. To this end, we propose a mutual excitation graph convolutional
network (me-GCN) by stacking mutual excitation graph convolution (me-GC)
layers. Specifically, me-GC uses a mutual topology excitation module to firstly
extract adjacency matrices from individual entities and then adaptively model
the mutual constraints between them. Moreover, me-GC extends the above idea and
further uses a mutual feature excitation module to extract and merge deep
features from pairwise entities. Compared with graph convolution, our proposed
me-GC gradually learns mutual information in each layer and each stage of graph
convolution operations. Extensive experiments on a challenging hand-to-hand
interaction dataset, i.e., the Assembely101 dataset, and two large-scale
human-to-human interaction datasets, i.e., NTU60-Interaction and
NTU120-Interaction consistently verify the superiority of our proposed method,
which outperforms the state-of-the-art GCN-based and Transformer-based methods.
Related papers
- Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Human-to-Human Interaction Detection [3.00604614803979]
We introduce a new task named human-to-human interaction detection (HID)
HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model.
First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I)
arXiv Detail & Related papers (2023-07-02T03:24:58Z) - HIORE: Leveraging High-order Interactions for Unified Entity Relation
Extraction [85.80317530027212]
We propose HIORE, a new method for unified entity relation extraction.
The key insight is to leverage the complex association among word pairs, which contains richer information than the first-order word-by-word interactions.
Experiments show that HIORE achieves the state-of-the-art performance on relation extraction and an improvement of 1.11.8 F1 points over the prior best unified model.
arXiv Detail & Related papers (2023-05-07T14:57:42Z) - Two-person Graph Convolutional Network for Skeleton-based Human
Interaction Recognition [11.650290790796323]
Graph Convolutional Network (GCN) outperforms previous methods in the skeleton-based human action recognition area.
We introduce a novel unified two-person graph representing spatial interaction correlations between joints.
Experiments show accuracy improvements in both interactions and individual actions when utilizing the proposed two-person graph topology.
arXiv Detail & Related papers (2022-08-12T08:50:15Z) - IGFormer: Interaction Graph Transformer for Skeleton-based Human
Interaction Recognition [26.05948629634753]
We propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition.
IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts.
We also propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence.
arXiv Detail & Related papers (2022-07-25T12:11:15Z) - Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.