Related papers: Learning Adaptive Node Selection with External Attention for Human Interaction Recognition

Learning Adaptive Node Selection with External Attention for Human Interaction Recognition

URL: http://arxiv.org/abs/2507.03936v1
Date: Sat, 05 Jul 2025 07:47:00 GMT
Title: Learning Adaptive Node Selection with External Attention for Human Interaction Recognition
Authors: Chen Pang, Xuequan Lu, Qianyu Zhou, Lei Lyu,
Abstract summary: Most GCN-based methods model interacting individuals as independent graphs, neglecting their inherent inter-dependencies.<n>We propose the Active Node Selection with External Attention Network (ASEA), an innovative approach that dynamically captures interaction relationships without predefined assumptions.<n>Our method models each participant individually using a GCN to capture intra-personal relationships, facilitating a detailed representation of their actions.
Score: 11.88304209222785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most GCN-based methods model interacting individuals as independent graphs, neglecting their inherent inter-dependencies. Although recent approaches utilize predefined interaction adjacency matrices to integrate participants, these matrices fail to adaptively capture the dynamic and context-specific joint interactions across different actions. In this paper, we propose the Active Node Selection with External Attention Network (ASEA), an innovative approach that dynamically captures interaction relationships without predefined assumptions. Our method models each participant individually using a GCN to capture intra-personal relationships, facilitating a detailed representation of their actions. To identify the most relevant nodes for interaction modeling, we introduce the Adaptive Temporal Node Amplitude Calculation (AT-NAC) module, which estimates global node activity by combining spatial motion magnitude with adaptive temporal weighting, thereby highlighting salient motion patterns while reducing irrelevant or redundant information. A learnable threshold, regularized to prevent extreme variations, is defined to selectively identify the most informative nodes for interaction modeling. To capture interactions, we design the External Attention (EA) module to operate on active nodes, effectively modeling the interaction dynamics and semantic relationships between individuals. Extensive evaluations show that our method captures interaction relationships more effectively and flexibly, achieving state-of-the-art performance.

Related papers

Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition [21.007782102151282]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.<n>Me-GC learns mutual information in each layer and each stage of graph convolution operations.<n>Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z)
Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition [8.513434732050749]
We propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations.
arXiv Detail & Related papers (2023-07-14T16:51:25Z)
Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning [33.181732857907384]
We propose to model interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL) Considering the large exploration space for voxel-wise prediction, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents. Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2023-03-19T15:52:56Z)
Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z)
ConTIG: Continuous Representation Learning on Temporal Interaction Graphs [32.25218861788686]
ConTIG is a continuous representation method that captures the continuous dynamic evolution of node embedding trajectories. Our model exploit three-fold factors in dynamic networks which include latest interaction, neighbor features and inherent characteristics. Experiments results demonstrate the superiority of ConTIG on temporal link prediction, temporal node recommendation and dynamic node classification tasks.
arXiv Detail & Related papers (2021-09-27T12:11:24Z)
Spatio-Temporal Dynamic Inference Network for Group Activity Recognition [7.007702816885332]
Group activity aims to understand the activity performed by a group of people in order to solve it. Previous methods are limited in reasoning on a predefined graph, which ignores the person-specific context. We propose Dynamic Inference Network (DIN), which composes of Dynamic Relation (DR) module and Dynamic Walk (DW) module.
arXiv Detail & Related papers (2021-08-26T12:40:20Z)
Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects. The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z)
Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z)
Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.