Action Recognition with Multi-stream Motion Modeling and Mutual
Information Maximization
- URL: http://arxiv.org/abs/2306.07576v1
- Date: Tue, 13 Jun 2023 06:56:09 GMT
- Title: Action Recognition with Multi-stream Motion Modeling and Mutual
Information Maximization
- Authors: Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang,
Shuang Wu, Zhibo Wang, Kui Ren
- Abstract summary: Action recognition is a fundamental and intriguing problem in artificial intelligence.
We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention.
Our approach sets the new state-of-the-art performance on three benchmark datasets.
- Score: 44.73161606369333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action recognition has long been a fundamental and intriguing problem in
artificial intelligence. The task is challenging due to the high dimensionality
nature of an action, as well as the subtle motion details to be considered.
Current state-of-the-art approaches typically learn from articulated motion
sequences in the straightforward 3D Euclidean space. However, the vanilla
Euclidean space is not efficient for modeling important motion characteristics
such as the joint-wise angular acceleration, which reveals the driving force
behind the motion. Moreover, current methods typically attend to each channel
equally and lack theoretical constrains on extracting task-relevant features
from the input.
In this paper, we seek to tackle these challenges from three aspects: (1) We
propose to incorporate an acceleration representation, explicitly modeling the
higher-order variations in motion. (2) We introduce a novel Stream-GCN network
equipped with multi-stream components and channel attention, where different
representations (i.e., streams) supplement each other towards a more precise
action recognition while attention capitalizes on those important channels. (3)
We explore feature-level supervision for maximizing the extraction of
task-relevant information and formulate this into a mutual information loss.
Empirically, our approach sets the new state-of-the-art performance on three
benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. Our code is
anonymously released at https://github.com/ActionR-Group/Stream-GCN, hoping to
inspire the community.
Related papers
- Learning Scene Flow With Skeleton Guidance For 3D Action Recognition [1.5954459915735735]
This work demonstrates the use of 3D flow sequence by a deeptemporal model for 3D action recognition.
An extended deep skeleton is also introduced to learn the most discriminant action motion dynamics.
A late fusion scheme is adopted between the two models for learning the high level cross-modal correlations.
arXiv Detail & Related papers (2023-06-23T04:14:25Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Knowing What, Where and When to Look: Efficient Video Action Modeling
with Attention [84.83632045374155]
Attentive video modeling is essential for action recognition in unconstrained videos.
What-Where-When (W3) video attention module models all three facets of video attention jointly.
Experiments show that our attention model brings significant improvements to existing action recognition models.
arXiv Detail & Related papers (2020-04-02T21:48:11Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z) - Self-Supervised Joint Encoding of Motion and Appearance for First Person
Action Recognition [19.93779132095822]
We argue that learning features jointly intertwine from these two information channels is beneficial.
We propose a single stream architecture able to do so, thanks to the addition of a self-supervised motion prediction block.
Experiments on several publicly available databases show the power of our approach.
arXiv Detail & Related papers (2020-02-10T17:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.