Related papers: Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond

Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond

URL: http://arxiv.org/abs/2406.02978v1
Date: Wed, 5 Jun 2024 06:21:54 GMT
Title: Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond
Authors: Jiahang Zhang, Lilang Lin, Shuai Yang, Jiaying Liu,
Abstract summary: Self-supervised learning (SSL) has been proven effective for label-efficient skeleton-based action understanding. Recent endeavors have been made for skeleton-based SSL and remarkable progress has been achieved. In this paper, we conduct a comprehensive survey on self-supervised skeleton-based action representation learning.
Score: 19.074841631219233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for label-efficient skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial structures and diverse representation forms, with the absence of background clues and the additional temporal dimension. This presents the new challenges for the pretext task design of spatial-temporal motion representation learning. Recently, many endeavors have been made for skeleton-based SSL and remarkable progress has been achieved. However, a systematic and thorough review is still lacking. In this paper, we conduct, for the first time, a comprehensive survey on self-supervised skeleton-based action representation learning, where various literature is organized according to their pre-training pretext task methodologies. Following the taxonomy of context-based, generative learning, and contrastive learning approaches, we make a thorough review and benchmark of existing works and shed light on the future possible directions. Our investigation demonstrates that most SSL works rely on the single paradigm, learning representations of a single level, and are evaluated on the action recognition task solely, which leaves the generalization power of skeleton SSL models under-explored. To this end, a novel and effective SSL method for skeleton is further proposed, which integrates multiple pretext tasks to jointly learn versatile representations of different granularity, substantially boosting the generalization capacity for different downstream tasks. Extensive experiments under three large-scale datasets demonstrate that the proposed method achieves the superior generalization performance on various downstream tasks, including recognition, retrieval, detection, and few-shot learning.

Related papers

3D Skeleton-Based Action Recognition: A Review [60.0580120274659]
3D skeleton-based action recognition has become a prominent topic in the field of computer vision.<n>Previous reviews have predominantly adopted a model-oriented perspective, often neglecting the fundamental steps involved in skeleton-based action recognition.<n>This review aims to address these limitations by presenting a comprehensive, task-oriented framework for understanding skeleton-based action recognition.
arXiv Detail & Related papers (2025-06-01T09:04:12Z)
Language Guided Concept Bottleneck Models for Interpretable Continual Learning [62.09201360376577]
Continual learning aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information. Most existing CL methods focus primarily on preserving learned knowledge to improve model performance. We introduce a novel framework that integrates language-guided Concept Bottleneck Models to address both challenges.
arXiv Detail & Related papers (2025-03-30T02:41:55Z)
USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation [24.90512145836643]
We introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation. We show that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2024-12-12T12:20:27Z)
Exploring the Precise Dynamics of Single-Layer GAN Models: Leveraging Multi-Feature Discriminators for High-Dimensional Subspace Learning [0.0]
We study the training dynamics of a single-layer GAN model from the perspective of subspace learning. By bridging our analysis to the realm of subspace learning, we systematically compare the efficacy of GAN-based methods against conventional approaches.
arXiv Detail & Related papers (2024-11-01T10:21:12Z)
Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning [20.34477942813382]
Skeleton-based action representation learning aims to interpret and understand human behaviors by encoding the skeleton sequences. We introduce a novel skeleton-based training framework based on Cross-modal Contrastive learning. Our method outperforms the previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-05-31T03:40:15Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey [76.2650734930974]
Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. We review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective.
arXiv Detail & Related papers (2022-08-24T04:26:21Z)
Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning [45.13060970066485]
We propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS) Under both supervised and semi-supervised evaluation protocols, our method achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-07-20T04:21:05Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Self-supervised on Graphs: Contrastive, Generative,or Predictive [25.679620842010422]
Self-supervised learning (SSL) is emerging as a new paradigm for extracting informative knowledge through well-designed pretext tasks. We divide existing graph SSL methods into three categories: contrastive, generative, and predictive. We also summarize the commonly used datasets, evaluation metrics, downstream tasks, and open-source implementations of various algorithms.
arXiv Detail & Related papers (2021-05-16T03:30:03Z)
Reinforcement Learning with Prototypical Representations [114.35801511501639]
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations. These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations. This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
arXiv Detail & Related papers (2021-02-22T18:56:34Z)
MS$^2$L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition [36.74293548921099]
We integrate motion prediction, jigsaw puzzle recognition, and contrastive learning to learn skeleton features from different aspects. Our experiments on the NW-UCLA, NTU RGB+D, and PKUMMD datasets show remarkable performance for action recognition.
arXiv Detail & Related papers (2020-10-12T11:09:44Z)
Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.