Spatiotemporal Decouple-and-Squeeze Contrastive Learning for
Semi-Supervised Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2302.02316v1
- Date: Sun, 5 Feb 2023 06:52:25 GMT
- Title: Spatiotemporal Decouple-and-Squeeze Contrastive Learning for
Semi-Supervised Skeleton-based Action Recognition
- Authors: Binqian Xu, Xiangbo Shu
- Abstract summary: We propose a novel Stemporal Decouple Contrastive Learning (SDS-CL) framework to learn more abundant representations of skeleton-based actions.
We present a new Temporal-squeezing Loss (STL), a new Temporal-squeezing Loss (TSL), and the Global-contrasting Loss (GL) to contrast the spatial-squeezing joint and motion features at the frame level, temporal-squeezing joint and motion features at the joint level, as well as global joint and motion features at the skeleton level.
- Score: 12.601122522537459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has been successfully leveraged to learn action
representations for addressing the problem of semi-supervised skeleton-based
action recognition. However, most contrastive learning-based methods only
contrast global features mixing spatiotemporal information, which confuses the
spatial- and temporal-specific information reflecting different semantic at the
frame level and joint level. Thus, we propose a novel Spatiotemporal
Decouple-and-Squeeze Contrastive Learning (SDS-CL) framework to comprehensively
learn more abundant representations of skeleton-based actions by jointly
contrasting spatial-squeezing features, temporal-squeezing features, and global
features. In SDS-CL, we design a new Spatiotemporal-decoupling Intra-Inter
Attention (SIIA) mechanism to obtain the spatiotemporal-decoupling attentive
features for capturing spatiotemporal specific information by calculating
spatial- and temporal-decoupling intra-attention maps among joint/motion
features, as well as spatial- and temporal-decoupling inter-attention maps
between joint and motion features. Moreover, we present a new Spatial-squeezing
Temporal-contrasting Loss (STL), a new Temporal-squeezing Spatial-contrasting
Loss (TSL), and the Global-contrasting Loss (GL) to contrast the
spatial-squeezing joint and motion features at the frame level,
temporal-squeezing joint and motion features at the joint level, as well as
global joint and motion features at the skeleton level. Extensive experimental
results on four public datasets show that the proposed SDS-CL achieves
performance gains compared with other competitive methods.
Related papers
- Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition [64.56321246196859]
We propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework.
We first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information.
We introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes.
arXiv Detail & Related papers (2024-11-18T05:16:11Z) - Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based
Human Action Recognition [10.403751563214113]
STD-CL is a framework to obtain discriminative and semantically distinct representations from the sequences.
STD-CL achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks.
arXiv Detail & Related papers (2023-12-23T02:54:41Z) - SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation [53.010417880335424]
Semi-supervised temporal action segmentation (SS-TA) aims to perform frame-wise classification in long untrimmed videos.
Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data.
We propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations.
arXiv Detail & Related papers (2023-12-19T17:26:44Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - SCD-Net: Spatiotemporal Clues Disentanglement Network for
Self-supervised Skeleton-based Action Recognition [39.99711066167837]
This paper introduces a contrastive learning framework, namely Stemporal Clues Disentanglement Network (SCD-Net)
Specifically, we integrate the sequences with a feature extractor to derive explicit clues from spatial and temporal domains respectively.
We conduct evaluations on the NTU-+D (60&120) PKU-MMDI (&I) datasets, covering various downstream tasks such as action recognition, action retrieval, transfer learning.
arXiv Detail & Related papers (2023-09-11T21:32:13Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Spatio-temporal Diffusion Point Processes [23.74522530140201]
patio-temporal point process (STPP) is a collection of events accompanied with time and space.
The failure to model the joint distribution leads to limited capacities in characterizing the pasthua-temporal interactions given events.
We propose a novel parameterization framework, which learns complex spatial-temporal joint distributions.
Our framework outperforms the state-of-the-art baselines remarkably, with an average improvement over 50%.
arXiv Detail & Related papers (2023-05-21T08:53:00Z) - Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action
Recognition [46.836815779215456]
We present a novel decoupled spatial-temporal attention network(DSTA-Net) for skeleton-based action recognition.
Three techniques are proposed for building attention blocks, namely, spatial-temporal attention decoupling, decoupled position encoding and spatial global regularization.
To test the effectiveness of the proposed method, extensive experiments are conducted on four challenging datasets for skeleton-based gesture and action recognition.
arXiv Detail & Related papers (2020-07-07T07:58:56Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.