Real-time Human Action Recognition Using Locally Aggregated
Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model
- URL: http://arxiv.org/abs/2105.11312v1
- Date: Mon, 24 May 2021 14:46:40 GMT
- Title: Real-time Human Action Recognition Using Locally Aggregated
Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model
- Authors: Bin Sun, Dehui Kong, Shaofan Wang, Lichun Wang, Baocai Yin
- Abstract summary: 3D action recognition suffers from three problems: highly complicated articulation, a great amount of noise, and a low implementation efficiency.
We propose a real-time 3D action recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model.
Experimental results on MSRAction3D, UTKinectAction3D and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.
- Score: 30.435850177921086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D action recognition is referred to as the classification of action
sequences which consist of 3D skeleton joints. While many research work are
devoted to 3D action recognition, it mainly suffers from three problems: highly
complicated articulation, a great amount of noise, and a low implementation
efficiency. To tackle all these problems, we propose a real-time 3D action
recognition framework by integrating the locally aggregated kinematic-guided
skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model. We first
define the skeletonlet as a few combinations of joint offsets grouped in terms
of kinematic principle, and then represent an action sequence using LAKS, which
consists of a denoising phase and a locally aggregating phase. The denoising
phase detects the noisy action data and adjust it by replacing all the features
within it with the features of the corresponding previous frame, while the
locally aggregating phase sums the difference between an offset feature of the
skeletonlet and its cluster center together over all the offset features of the
sequence. Finally, the SHA model which combines sparse representation with a
hashing model, aiming at promoting the recognition accuracy while maintaining a
high efficiency. Experimental results on MSRAction3D, UTKinectAction3D and
Florence3DAction datasets demonstrate that the proposed method outperforms
state-of-the-art methods in both recognition accuracy and implementation
efficiency.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - SkeleTR: Towrads Skeleton-based Action Recognition in the Wild [86.03082891242698]
SkeleTR is a new framework for skeleton-based action recognition.
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions.
It then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
arXiv Detail & Related papers (2023-09-20T16:22:33Z) - One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - Learning Scene Flow With Skeleton Guidance For 3D Action Recognition [1.5954459915735735]
This work demonstrates the use of 3D flow sequence by a deeptemporal model for 3D action recognition.
An extended deep skeleton is also introduced to learn the most discriminant action motion dynamics.
A late fusion scheme is adopted between the two models for learning the high level cross-modal correlations.
arXiv Detail & Related papers (2023-06-23T04:14:25Z) - Unified Keypoint-based Action Recognition Framework via Structured
Keypoint Pooling [3.255030588361124]
This paper simultaneously addresses three limitations associated with conventional skeleton-based action recognition.
A point cloud deep-learning paradigm is introduced to the action recognition.
A novel deep neural network architecture called Structured Keypoint Pooling is proposed.
arXiv Detail & Related papers (2023-03-27T14:59:08Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - Revisiting Skeleton-based Action Recognition [107.08112310075114]
PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
arXiv Detail & Related papers (2021-04-28T06:32:17Z) - Skeleton Based Action Recognition using a Stacked Denoising Autoencoder
with Constraints of Privileged Information [5.67220249825603]
We propose a new method to study the skeletal representation in a view of skeleton reconstruction.
Based on the concept of learning under privileged information, we integrate action categories and temporal coordinates into a stacked denoising autoencoder.
In order to mitigate the variation resulting from temporary misalignment, a new method of temporal registration is proposed.
arXiv Detail & Related papers (2020-03-12T09:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.