A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
- URL: http://arxiv.org/abs/2303.04244v1
- Date: Tue, 7 Mar 2023 21:35:02 GMT
- Title: A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
- Authors: Robert T. Collins
- Abstract summary: Training samples consist of temporal windows of frames containing 3D body points such as mocap markers or skeleton joints.
A light-weight, 3-layer encoder is trained using a contrastive loss function that encourages embedding vectors of augmented sample pairs to have cosine similarity 1, and similarity 0 with all other samples in a minibatch.
In addition to being simple, the proposed method is fast to train, making it easy to adapt to new data using different marker sets or skeletal joint layouts.
- Score: 1.0152838128195467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple unsupervised method for learning an encoder mapping short
3D pose sequences into embedding vectors suitable for sequence-to-sequence
alignment by dynamic time warping. Training samples consist of temporal windows
of frames containing 3D body points such as mocap markers or skeleton joints. A
light-weight, 3-layer encoder is trained using a contrastive loss function that
encourages embedding vectors of augmented sample pairs to have cosine
similarity 1, and similarity 0 with all other samples in a minibatch. When
multiple scripted training sequences are available, temporal alignments
inferred from an initial round of training are harvested to extract additional,
cross-performance match pairs for a second phase of training to refine the
encoder. In addition to being simple, the proposed method is fast to train,
making it easy to adapt to new data using different marker sets or skeletal
joint layouts. Experimental results illustrate ease of use, transferability,
and utility of the learned embeddings for comparing and analyzing human
behavior sequences.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Adaptive Siamese Tracking with a Compact Latent Network [219.38172719948048]
We present an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification.
Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples.
We apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN.
arXiv Detail & Related papers (2023-02-02T08:06:02Z) - Generative-Contrastive Learning for Self-Supervised Latent
Representations of 3D Shapes from Multi-Modal Euclidean Input [44.10761155817833]
We propose a combined generative and contrastive neural architecture for learning latent representations of 3D shapes.
The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape.
arXiv Detail & Related papers (2023-01-11T18:14:24Z) - Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action
Recognition [38.27785891922479]
Few-shot learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2022-10-30T11:46:38Z) - Differentiable Point-Based Radiance Fields for Efficient View Synthesis [57.56579501055479]
We propose a differentiable rendering algorithm for efficient novel view synthesis.
Our method is up to 300x faster than NeRF in both training and inference.
For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate.
arXiv Detail & Related papers (2022-05-28T04:36:13Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - Pose Adaptive Dual Mixup for Few-Shot Single-View 3D Reconstruction [35.30827580375749]
We present a pose adaptive few-shot learning procedure and a two-stage data regularization, termed PADMix, for single-image 3D reconstruction.
PADMix significantly outperforms previous literature on few-shot settings over the ShapeNet dataset and sets new benchmarks on the more challenging real-world Pix3D dataset.
arXiv Detail & Related papers (2021-12-23T12:22:08Z) - Representation Learning via Global Temporal Alignment and
Cycle-Consistency [20.715813546383178]
We introduce a weakly supervised method for representation learning based on aligning temporal sequences.
We report significant performance increases over previous methods.
In addition, we report two applications of our temporal alignment framework, namely 3D pose reconstruction and fine-grained audio/visual retrieval.
arXiv Detail & Related papers (2021-05-11T17:34:04Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z) - SeCo: Exploring Sequence Supervision for Unsupervised Representation
Learning [114.58986229852489]
In this paper, we explore the basic and generic supervision in the sequence from spatial, sequential and temporal perspectives.
We derive a particular form named Contrastive Learning (SeCo)
SeCo shows superior results under the linear protocol on action recognition, untrimmed activity recognition and object tracking.
arXiv Detail & Related papers (2020-08-03T15:51:35Z) - CSI: Novelty Detection via Contrastive Learning on Distributionally
Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI)
In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself.
Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.