Related papers: Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition

Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition

URL: http://arxiv.org/abs/2402.02431v1
Date: Sun, 4 Feb 2024 10:00:00 GMT
Title: Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
Authors: Mengyuan Liu, Chen Chen, Songtao Wu, Fanyang Meng, Hong Liu
Abstract summary: We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers. Me-GC learns mutual information in each layer and each stage of graph convolution operations. Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
Score: 22.538114033191313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recognizing interactive actions, including hand-to-hand interaction and human-to-human interaction, has attracted increasing attention for various applications in the field of video analysis and human-robot interaction. Considering the success of graph convolution in modeling topology-aware features from skeleton data, recent methods commonly operate graph convolution on separate entities and use late fusion for interactive action recognition, which can barely model the mutual semantic relationships between pairwise entities. To this end, we propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution (me-GC) layers. Specifically, me-GC uses a mutual topology excitation module to firstly extract adjacency matrices from individual entities and then adaptively model the mutual constraints between them. Moreover, me-GC extends the above idea and further uses a mutual feature excitation module to extract and merge deep features from pairwise entities. Compared with graph convolution, our proposed me-GC gradually learns mutual information in each layer and each stage of graph convolution operations. Extensive experiments on a challenging hand-to-hand interaction dataset, i.e., the Assembely101 dataset, and two large-scale human-to-human interaction datasets, i.e., NTU60-Interaction and NTU120-Interaction consistently verify the superiority of our proposed method, which outperforms the state-of-the-art GCN-based and Transformer-based methods.

Related papers

Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation [14.707224594220264]
We propose a Text-Derived Graph Network (TRG-Net) to enhance both modeling and supervision. For modeling, the Dynamic Spatio-Temporal Fusion Modeling (D) method incorporates Text-Derived Joint Graphs (JGT) with channel adaptation. For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class.
arXiv Detail & Related papers (2025-03-19T11:38:14Z)
Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture. Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection. Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z)
Human-to-Human Interaction Detection [3.00604614803979]
We introduce a new task named human-to-human interaction detection (HID) HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model. First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I)
arXiv Detail & Related papers (2023-07-02T03:24:58Z)
Two-person Graph Convolutional Network for Skeleton-based Human Interaction Recognition [11.650290790796323]
Graph Convolutional Network (GCN) outperforms previous methods in the skeleton-based human action recognition area. We introduce a novel unified two-person graph representing spatial interaction correlations between joints. Experiments show accuracy improvements in both interactions and individual actions when utilizing the proposed two-person graph topology.
arXiv Detail & Related papers (2022-08-12T08:50:15Z)
IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition [26.05948629634753]
We propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition. IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts. We also propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence.
arXiv Detail & Related papers (2022-07-25T12:11:15Z)
Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z)
DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)
A Graph-based Interactive Reasoning for Human-Object Interaction Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs. We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.