DeepActsNet: Spatial and Motion features from Face, Hands, and Body
Combined with Convolutional and Graph Networks for Improved Action
Recognition
- URL: http://arxiv.org/abs/2009.09818v3
- Date: Fri, 4 Jun 2021 04:09:54 GMT
- Title: DeepActsNet: Spatial and Motion features from Face, Hands, and Body
Combined with Convolutional and Graph Networks for Improved Action
Recognition
- Authors: Umar Asif, Deval Mehta, Stefan von Cavallar, Jianbin Tang, and Stefan
Harrer
- Abstract summary: We present "Deep Action Stamps (DeepActs)", a novel data representation to encode actions from video sequences.
We also present "DeepActsNet", a deep learning based ensemble model which learns convolutional and structural features from Deep Action Stamps for highly accurate action recognition.
- Score: 10.690794159983199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing action recognition methods mainly focus on joint and bone
information in human body skeleton data due to its robustness to complex
backgrounds and dynamic characteristics of the environments. In this paper, we
combine body skeleton data with spatial and motion features from face and two
hands, and present "Deep Action Stamps (DeepActs)", a novel data representation
to encode actions from video sequences. We also present "DeepActsNet", a deep
learning based ensemble model which learns convolutional and structural
features from Deep Action Stamps for highly accurate action recognition.
Experiments on three challenging action recognition datasets (NTU60, NTU120,
and SYSU) show that the proposed model trained using Deep Action Stamps produce
considerable improvements in the action recognition accuracy with less
computational cost compared to the state-of-the-art methods.
Related papers
- Improving Video Violence Recognition with Human Interaction Learning on
3D Skeleton Point Clouds [88.87985219999764]
We develop a method for video violence recognition from a new perspective of skeleton points.
We first formulate 3D skeleton point clouds from human sequences extracted from videos.
We then perform interaction learning on these 3D skeleton point clouds.
arXiv Detail & Related papers (2023-08-26T12:55:18Z) - Learning Scene Flow With Skeleton Guidance For 3D Action Recognition [1.5954459915735735]
This work demonstrates the use of 3D flow sequence by a deeptemporal model for 3D action recognition.
An extended deep skeleton is also introduced to learn the most discriminant action motion dynamics.
A late fusion scheme is adopted between the two models for learning the high level cross-modal correlations.
arXiv Detail & Related papers (2023-06-23T04:14:25Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Spatial-Temporal Alignment Network for Action Recognition and Detection [80.19235282200697]
This paper studies how to introduce viewpoint-invariant feature representations that can help action recognition and detection.
We propose a novel Spatial-Temporal Alignment Network (STAN) that aims to learn geometric invariant representations for action recognition and action detection.
We test our STAN model extensively on AVA, Kinetics-400, AVA-Kinetics, Charades, and Charades-Ego datasets.
arXiv Detail & Related papers (2020-12-04T06:23:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.