3D Human Action Representation Learning via Cross-View Consistency
Pursuit
- URL: http://arxiv.org/abs/2104.14466v2
- Date: Sat, 1 May 2021 15:30:07 GMT
- Title: 3D Human Action Representation Learning via Cross-View Consistency
Pursuit
- Authors: Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, Wenjun
Zhang
- Abstract summary: We propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR)
CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner.
- Score: 52.19199260960558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a Cross-view Contrastive Learning framework for
unsupervised 3D skeleton-based action Representation (CrosSCLR), by leveraging
multi-view complementary supervision signal. CrosSCLR consists of both
single-view contrastive learning (SkeletonCLR) and cross-view consistent
knowledge mining (CVC-KM) modules, integrated in a collaborative learning
manner. It is noted that CVC-KM works in such a way that high-confidence
positive/negative samples and their distributions are exchanged among views
according to their embedding similarity, ensuring cross-view consistency in
terms of contrastive context, i.e., similar distributions. Extensive
experiments show that CrosSCLR achieves remarkable action recognition results
on NTU-60 and NTU-120 datasets under unsupervised settings, with observed
higher-quality action representations. Our code is available at
https://github.com/LinguoLi/CrosSCLR.
Related papers
- SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation [53.010417880335424]
Semi-supervised temporal action segmentation (SS-TA) aims to perform frame-wise classification in long untrimmed videos.
Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data.
We propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations.
arXiv Detail & Related papers (2023-12-19T17:26:44Z) - Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition [19.86316311525552]
This paper first applies a new contrastive learning method called BYOL to learn from skeleton data.
Inspired by SkeletonBYOL, this paper further presents a Cross-Model and Cross-Stream framework.
arXiv Detail & Related papers (2023-07-15T12:37:18Z) - Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods [4.680881326162484]
Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results.
We propose an approach to identify those images with similar semantic content and treat them as positive instances.
We run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches.
arXiv Detail & Related papers (2023-06-28T11:47:08Z) - Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition [22.067143671631303]
Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
arXiv Detail & Related papers (2023-05-03T10:31:35Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Mutual Contrastive Learning for Visual Representation Learning [1.9355744690301404]
We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning.
Benefiting from MCL, each model can learn extra contrastive knowledge from others, leading to more meaningful feature representations.
Experimental results on supervised and self-supervised image classification, transfer learning and few-shot learning show that MCL can lead to consistent performance gains.
arXiv Detail & Related papers (2021-04-26T13:32:33Z) - SeCo: Exploring Sequence Supervision for Unsupervised Representation
Learning [114.58986229852489]
In this paper, we explore the basic and generic supervision in the sequence from spatial, sequential and temporal perspectives.
We derive a particular form named Contrastive Learning (SeCo)
SeCo shows superior results under the linear protocol on action recognition, untrimmed activity recognition and object tracking.
arXiv Detail & Related papers (2020-08-03T15:51:35Z) - Spatial-Temporal Multi-Cue Network for Continuous Sign Language
Recognition [141.24314054768922]
We propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem.
To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks.
arXiv Detail & Related papers (2020-02-08T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.