Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition
- URL: http://arxiv.org/abs/2207.08095v1
- Date: Sun, 17 Jul 2022 07:05:39 GMT
- Title: Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition
- Authors: Yansong Tang, Xingyu Liu, Xumin Yu, Danyang Zhang, Jiwen Lu, Jie Zhou
- Abstract summary: Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
- Score: 88.34182299496074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid progress and superior performance have been achieved for skeleton-based
action recognition recently. In this article, we investigate this problem under
a cross-dataset setting, which is a new, pragmatic, and challenging task in
real-world scenarios. Following the unsupervised domain adaptation (UDA)
paradigm, the action labels are only available on a source dataset, but
unavailable on a target dataset in the training stage. Different from the
conventional adversarial learning-based approaches for UDA, we utilize a
self-supervision scheme to reduce the domain shift between two skeleton-based
action datasets. Our inspiration is drawn from Cubism, an art genre from the
early 20th century, which breaks and reassembles the objects to convey a
greater context. By segmenting and permuting temporal segments or human body
parts, we design two self-supervised learning classification tasks to explore
the temporal and spatial dependency of a skeleton-based action and improve the
generalization ability of the model. We conduct experiments on six datasets for
skeleton-based action recognition, including three large-scale datasets (NTU
RGB+D, PKU-MMD, and Kinetics) where new cross-dataset settings and benchmarks
are established. Extensive results demonstrate that our method outperforms
state-of-the-art approaches. The source codes of our model and all the compared
methods are available at https://github.com/shanice-l/st-cubism.
Related papers
- iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning [22.14627083675405]
We propose incremental neural mesh models that can be extended with new meshes over time.
We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets.
Our work also presents the first incremental learning approach for pose estimation.
arXiv Detail & Related papers (2024-07-12T13:57:49Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Navigating Open Set Scenarios for Skeleton-based Action Recognition [45.488649741347]
We tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task.
We propose a distance-based cross-modality method that leverages the cross-modal alignment of skeleton joints, bones, and velocities.
arXiv Detail & Related papers (2023-12-11T12:29:32Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Skeleton Split Strategies for Spatial Temporal Graph Convolution
Networks [2.132096006921048]
A skeleton representation of the human body has been proven to be effective for this task.
A new set of methods to perform the convolution operation upon the skeleton graph is presented.
arXiv Detail & Related papers (2021-08-03T05:57:52Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.