Skeleton Based Action Recognition using a Stacked Denoising Autoencoder
with Constraints of Privileged Information
- URL: http://arxiv.org/abs/2003.05684v1
- Date: Thu, 12 Mar 2020 09:56:22 GMT
- Title: Skeleton Based Action Recognition using a Stacked Denoising Autoencoder
with Constraints of Privileged Information
- Authors: Zhize Wu, Thomas Weise, Le Zou, Fei Sun, Ming Tan
- Abstract summary: We propose a new method to study the skeletal representation in a view of skeleton reconstruction.
Based on the concept of learning under privileged information, we integrate action categories and temporal coordinates into a stacked denoising autoencoder.
In order to mitigate the variation resulting from temporary misalignment, a new method of temporal registration is proposed.
- Score: 5.67220249825603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, with the availability of cost-effective depth cameras coupled with
real-time skeleton estimation, the interest in skeleton-based human action
recognition is renewed. Most of the existing skeletal representation approaches
use either the joint location or the dynamics model. Differing from the
previous studies, we propose a new method called Denoising Autoencoder with
Temporal and Categorical Constraints (DAE_CTC)} to study the skeletal
representation in a view of skeleton reconstruction. Based on the concept of
learning under privileged information, we integrate action categories and
temporal coordinates into a stacked denoising autoencoder in the training
phase, to preserve category and temporal feature, while learning the hidden
representation from a skeleton. Thus, we are able to improve the discriminative
validity of the hidden representation. In order to mitigate the variation
resulting from temporary misalignment, a new method of temporal registration,
called Locally-Warped Sequence Registration (LWSR), is proposed for registering
the sequences of inter- and intra-class actions. We finally represent the
sequences using a Fourier Temporal Pyramid (FTP) representation and perform
classification using a combination of LWSR registration, FTP representation,
and a linear Support Vector Machine (SVM). The experimental results on three
action data sets, namely MSR-Action3D, UTKinect-Action, and Florence3D-Action,
show that our proposal performs better than many existing methods and
comparably to the state of the art.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - LAC: Latent Action Composition for Skeleton-based Action Segmentation [21.797658771678066]
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos.
Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions.
We propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
arXiv Detail & Related papers (2023-08-28T11:20:48Z) - Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action
Recognition [38.27785891922479]
Few-shot learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2022-10-30T11:46:38Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - Real-time Human Action Recognition Using Locally Aggregated
Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model [30.435850177921086]
3D action recognition suffers from three problems: highly complicated articulation, a great amount of noise, and a low implementation efficiency.
We propose a real-time 3D action recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model.
Experimental results on MSRAction3D, UTKinectAction3D and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.
arXiv Detail & Related papers (2021-05-24T14:46:40Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.