Revisiting Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2104.13586v1
- Date: Wed, 28 Apr 2021 06:32:17 GMT
- Title: Revisiting Skeleton-based Action Recognition
- Authors: Haodong Duan, Yue Zhao, Kai Chen, Dian Shao, Dahua Lin, Bo Dai
- Abstract summary: PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
- Score: 107.08112310075114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human skeleton, as a compact representation of human action, has received
increasing attention in recent years. Many skeleton-based action recognition
methods adopt graph convolutional networks (GCN) to extract features on top of
human skeletons. Despite the positive results shown in previous works,
GCN-based methods are subject to limitations in robustness, interoperability,
and scalability. In this work, we propose PoseC3D, a new approach to
skeleton-based action recognition, which relies on a 3D heatmap stack instead
of a graph sequence as the base representation of human skeletons. Compared to
GCN-based methods, PoseC3D is more effective in learning spatiotemporal
features, more robust against pose estimation noises, and generalizes better in
cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios
without additional computation cost, and its features can be easily integrated
with other modalities at early fusion stages, which provides a great design
space to further boost the performance. On four challenging datasets, PoseC3D
consistently obtains superior performance, when used alone on skeletons and in
combination with the RGB modality.
Related papers
- One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition [32.07659338674024]
Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
arXiv Detail & Related papers (2022-10-10T02:08:49Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Skeleton Cloud Colorization for Unsupervised 3D Action Representation
Learning [65.88887113157627]
Skeleton-based human action recognition has attracted increasing attention in recent years.
We design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data.
We show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins.
arXiv Detail & Related papers (2021-08-04T10:55:39Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z) - Real-time Human Action Recognition Using Locally Aggregated
Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model [30.435850177921086]
3D action recognition suffers from three problems: highly complicated articulation, a great amount of noise, and a low implementation efficiency.
We propose a real-time 3D action recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model.
Experimental results on MSRAction3D, UTKinectAction3D and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.
arXiv Detail & Related papers (2021-05-24T14:46:40Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.