Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition
- URL: http://arxiv.org/abs/2402.02431v1
- Date: Sun, 4 Feb 2024 10:00:00 GMT
- Title: Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition
- Authors: Mengyuan Liu, Chen Chen, Songtao Wu, Fanyang Meng, Hong Liu
- Abstract summary: We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
- Score: 22.538114033191313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognizing interactive actions, including hand-to-hand interaction and
human-to-human interaction, has attracted increasing attention for various
applications in the field of video analysis and human-robot interaction.
Considering the success of graph convolution in modeling topology-aware
features from skeleton data, recent methods commonly operate graph convolution
on separate entities and use late fusion for interactive action recognition,
which can barely model the mutual semantic relationships between pairwise
entities. To this end, we propose a mutual excitation graph convolutional
network (me-GCN) by stacking mutual excitation graph convolution (me-GC)
layers. Specifically, me-GC uses a mutual topology excitation module to firstly
extract adjacency matrices from individual entities and then adaptively model
the mutual constraints between them. Moreover, me-GC extends the above idea and
further uses a mutual feature excitation module to extract and merge deep
features from pairwise entities. Compared with graph convolution, our proposed
me-GC gradually learns mutual information in each layer and each stage of graph
convolution operations. Extensive experiments on a challenging hand-to-hand
interaction dataset, i.e., the Assembely101 dataset, and two large-scale
human-to-human interaction datasets, i.e., NTU60-Interaction and
NTU120-Interaction consistently verify the superiority of our proposed method,
which outperforms the state-of-the-art GCN-based and Transformer-based methods.
Related papers
- Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others.
Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations.
We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Human-to-Human Interaction Detection [3.00604614803979]
We introduce a new task named human-to-human interaction detection (HID)
HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model.
First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I)
arXiv Detail & Related papers (2023-07-02T03:24:58Z) - Two-person Graph Convolutional Network for Skeleton-based Human
Interaction Recognition [11.650290790796323]
Graph Convolutional Network (GCN) outperforms previous methods in the skeleton-based human action recognition area.
We introduce a novel unified two-person graph representing spatial interaction correlations between joints.
Experiments show accuracy improvements in both interactions and individual actions when utilizing the proposed two-person graph topology.
arXiv Detail & Related papers (2022-08-12T08:50:15Z) - IGFormer: Interaction Graph Transformer for Skeleton-based Human
Interaction Recognition [26.05948629634753]
We propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition.
IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts.
We also propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence.
arXiv Detail & Related papers (2022-07-25T12:11:15Z) - Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.