Related papers: Isolated Sign Language Recognition based on Tree Structure Skeleton Images

Isolated Sign Language Recognition based on Tree Structure Skeleton Images

URL: http://arxiv.org/abs/2304.05403v1
Date: Mon, 10 Apr 2023 01:58:50 GMT
Title: Isolated Sign Language Recognition based on Tree Structure Skeleton Images
Authors: David Laines, Gissella Bejarano, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz
Abstract summary: We use Tree Dense Structure Image (TSSI) as an alternative input to improve the accuracy of skeleton-based models for sign recognition. We trained a SignNet-121 using this type of input and compared it with other skeleton-based deep learning methods. Our model (SL-TSSI-DenseNet) overcomes the state-of-the-art of other skeleton-based models.
Score: 2.179313476241343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sign Language Recognition (SLR) systems aim to be embedded in video stream platforms to recognize the sign performed in front of a camera. SLR research has taken advantage of recent advances in pose estimation models to use skeleton sequences estimated from videos instead of RGB information to predict signs. This approach can make HAR-related tasks less complex and more robust to diverse backgrounds, lightning conditions, and physical appearances. In this work, we explore the use of a spatio-temporal skeleton representation such as Tree Structure Skeleton Image (TSSI) as an alternative input to improve the accuracy of skeleton-based models for SLR. TSSI converts a skeleton sequence into an RGB image where the columns represent the joints of the skeleton in a depth-first tree traversal order, the rows represent the temporal evolution of the joints, and the three channels represent the (x, y, z) coordinates of the joints. We trained a DenseNet-121 using this type of input and compared it with other skeleton-based deep learning methods using a large-scale American Sign Language (ASL) dataset, WLASL. Our model (SL-TSSI-DenseNet) overcomes the state-of-the-art of other skeleton-based models. Moreover, when including data augmentation our proposal achieves better results than both skeleton-based and RGB-based models. We evaluated the effectiveness of our model on the Ankara University Turkish Sign Language (TSL) dataset, AUTSL, and a Mexican Sign Language (LSM) dataset. On the AUTSL dataset, the model achieves similar results to the state-of-the-art of other skeleton-based models. On the LSM dataset, the model achieves higher results than the baseline. Code has been made available at: https://github.com/davidlainesv/SL-TSSI-DenseNet.

Related papers

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z)
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences [2.0257616108612373]
We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation.
arXiv Detail & Related papers (2024-05-05T15:50:02Z)
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL) In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE. Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z)
Learning Discriminative Representations for Skeleton Based Action Recognition [49.45405879193866]
We propose an auxiliary feature refinement head (FR Head) to obtain discriminative representations of skeletons. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples.
arXiv Detail & Related papers (2023-03-07T08:37:48Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Multi-Scale Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition. MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z)
Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate. We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR) Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z)
Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)
Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks [12.06555892772049]
We propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator. The proposed ST-TR achieves state-of-the-art performance on all datasets when using joints' coordinates as input, and results on-par with state-of-the-art when adding bones information.
arXiv Detail & Related papers (2020-08-17T15:25:40Z)
SkeletonNet: A Topology-Preserving Solution for Learning Mesh Reconstruction of Object Surfaces from RGB Images [85.66560542483286]
This paper focuses on the challenging task of learning 3D object surface reconstructions from RGB images. We propose two models, the Skeleton-Based GraphConvolutional Neural Network (SkeGCNN) and the Skeleton-Regularized Deep Implicit Surface Network (SkeDISN) We conduct thorough experiments that verify the efficacy of our proposed SkeletonNet.
arXiv Detail & Related papers (2020-08-13T07:59:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.