A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification
- URL: http://arxiv.org/abs/2009.03671v3
- Date: Mon, 5 Jul 2021 02:37:09 GMT
- Title: A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification
- Authors: Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Yi Guo, Jun Cheng,
Xinwang Liu, and Bin Hu
- Abstract summary: Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
- Score: 65.18004601366066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person re-identification (Re-ID) via gait features within 3D skeleton
sequences is a newly-emerging topic with several advantages. Existing solutions
either rely on hand-crafted descriptors or supervised gait representation
learning. This paper proposes a self-supervised gait encoding approach that can
leverage unlabeled skeleton data to learn gait representations for person
Re-ID. Specifically, we first create self-supervision by learning to
reconstruct unlabeled skeleton sequences reversely, which involves richer
high-level semantics to obtain better gait representations. Other pretext tasks
are also explored to further improve self-supervised learning. Second, inspired
by the fact that motion's continuity endows adjacent skeletons in one skeleton
sequence and temporally consecutive skeleton sequences with higher correlations
(referred as locality in 3D skeleton data), we propose a locality-aware
attention mechanism and a locality-aware contrastive learning scheme, which aim
to preserve locality-awareness on intra-sequence level and inter-sequence level
respectively during self-supervised learning. Last, with context vectors
learned by our locality-aware attention mechanism and contrastive learning
scheme, a novel feature named Constrastive Attention-based Gait Encodings
(CAGEs) is designed to represent gait effectively. Empirical evaluations show
that our approach significantly outperforms skeleton-based counterparts by
15-40% Rank-1 accuracy, and it even achieves superior performance to numerous
multi-modal methods with extra RGB or depth information. Our codes are
available at https://github.com/Kali-Hac/Locality-Awareness-SGE.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Part Aware Contrastive Learning for Self-Supervised Action Recognition [18.423841093299135]
This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR.
Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets.
arXiv Detail & Related papers (2023-05-01T05:31:48Z) - Self-Supervised 3D Action Representation Learning with Skeleton Cloud
Colorization [75.0912240667375]
3D Skeleton-based human action recognition has attracted increasing attention in recent years.
Most of the existing work focuses on supervised learning which requires a large number of labeled action sequences.
In this paper, we address self-supervised 3D action representation learning for skeleton-based action recognition.
arXiv Detail & Related papers (2023-04-18T08:03:26Z) - Contrastive Self-Supervised Learning for Skeleton Representations [2.528877542605869]
We use a contrastive self-supervised learning method, SimCLR, to learn representations that capture the semantics of skeleton point clouds.
To pre-train the representations, we normalise six existing datasets to obtain more than 40 million skeleton frames.
We evaluate the quality of the learned representations with three downstream tasks: skeleton reconstruction, motion prediction, and activity classification.
arXiv Detail & Related papers (2022-11-10T02:45:36Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z) - Skeleton Cloud Colorization for Unsupervised 3D Action Representation
Learning [65.88887113157627]
Skeleton-based human action recognition has attracted increasing attention in recent years.
We design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data.
We show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins.
arXiv Detail & Related papers (2021-08-04T10:55:39Z) - Self-Supervised Gait Encoding with Locality-Aware Attention for Person
Re-Identification [46.28501210524173]
Gait-based person re-identification (Re-ID) is valuable for safety-critical applications.
We propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner.
Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy.
arXiv Detail & Related papers (2020-08-21T12:03:17Z) - Skeleton Based Action Recognition using a Stacked Denoising Autoencoder
with Constraints of Privileged Information [5.67220249825603]
We propose a new method to study the skeletal representation in a view of skeleton reconstruction.
Based on the concept of learning under privileged information, we integrate action categories and temporal coordinates into a stacked denoising autoencoder.
In order to mitigate the variation resulting from temporary misalignment, a new method of temporal registration is proposed.
arXiv Detail & Related papers (2020-03-12T09:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.