Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition
- URL: http://arxiv.org/abs/2201.02849v1
- Date: Sat, 8 Jan 2022 16:03:01 GMT
- Title: Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition
- Authors: Helei Qiu, Biao Hou, Bo Ren and Xiaohua Zhang
- Abstract summary: Transformer shows great potential to model the correlation of important joints.
Existing Transformer-based methods cannot capture the correlation of different joints between frames.
Atemporal-recognition module is proposed to capture the relationship of different joints in consecutive frames.
- Score: 8.905895607185135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capturing the dependencies between joints is critical in skeleton-based
action recognition task. Transformer shows great potential to model the
correlation of important joints. However, the existing Transformer-based
methods cannot capture the correlation of different joints between frames,
which the correlation is very useful since different body parts (such as the
arms and legs in "long jump") between adjacent frames move together. Focus on
this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is
proposed. The skeleton sequence is divided into several parts, and several
consecutive frames contained in each part are encoded. And then a
spatio-temporal tuples self-attention module is proposed to capture the
relationship of different joints in consecutive frames. In addition, a feature
aggregation module is introduced between non-adjacent frames to enhance the
ability to distinguish similar actions. Compared with the state-of-the-art
methods, our method achieves better performance on two large-scale datasets.
Related papers
- Thin-Plate Spline-based Interpolation for Animation Line Inbetweening [54.69811179222127]
Chamfer Distance (CD) is commonly adopted for evaluating inbetweening performance.
We propose a simple yet effective method for animation line inbetweening that adopts thin-plate spline-based transformation.
Our method outperforms existing approaches by delivering high-quality results with enhanced fluidity.
arXiv Detail & Related papers (2024-08-17T08:05:31Z) - Technical Report: Masked Skeleton Sequence Modeling for Learning Larval Zebrafish Behavior Latent Embeddings [5.922172844641853]
We introduce a novel self-supervised learning method for extracting latent embeddings from behaviors of larval zebrafish.
For the skeletal sequences of swimming zebrafish, we propose a pioneering Transformer-CNN architecture, the Sequence Spatial-Temporal Transformer (SSTFormer)
arXiv Detail & Related papers (2024-03-23T02:58:10Z) - SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition [25.341177384559174]
We propose a novel approach called Skeletal-Temporal Transformer (SkateFormer)
SkateFormer partitions joints and frames based on different types of skeletal-temporal relation.
It can selectively focus on key joints and frames crucial for action recognition in an action-adaptive manner.
arXiv Detail & Related papers (2024-03-14T15:55:53Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - SkeleTR: Towrads Skeleton-based Action Recognition in the Wild [86.03082891242698]
SkeleTR is a new framework for skeleton-based action recognition.
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions.
It then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
arXiv Detail & Related papers (2023-09-20T16:22:33Z) - Shuffled Autoregression For Motion Interpolation [53.61556200049156]
This work aims to provide a deep-learning solution for the motion task.
We propose a novel framework, referred to as emphShuffled AutoRegression, which expands the autoregression to generate in arbitrary (shuffled) order.
We also propose an approach to constructing a particular kind of dependency graph, with three stages assembled into an end-to-end spatial-temporal motion Transformer.
arXiv Detail & Related papers (2023-06-10T07:14:59Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Alignment-guided Temporal Attention for Video Action Recognition [18.5171795689609]
We show that frame-by-frame alignments have the potential to increase the mutual information between frame representations.
We propose Alignment-guided Temporal Attention (ATA) to extend 1-dimensional temporal attention with parameter-free patch-level alignments between neighboring frames.
arXiv Detail & Related papers (2022-09-30T23:10:47Z) - Skeleton-Aware Networks for Deep Motion Retargeting [83.65593033474384]
We introduce a novel deep learning framework for data-driven motion between skeletons.
Our approach learns how to retarget without requiring any explicit pairing between the motions in the training set.
arXiv Detail & Related papers (2020-05-12T12:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.