Isolated Sign Language Recognition based on Tree Structure Skeleton
Images
- URL: http://arxiv.org/abs/2304.05403v1
- Date: Mon, 10 Apr 2023 01:58:50 GMT
- Title: Isolated Sign Language Recognition based on Tree Structure Skeleton
Images
- Authors: David Laines, Gissella Bejarano, Miguel Gonzalez-Mendoza, Gilberto
Ochoa-Ruiz
- Abstract summary: We use Tree Dense Structure Image (TSSI) as an alternative input to improve the accuracy of skeleton-based models for sign recognition.
We trained a SignNet-121 using this type of input and compared it with other skeleton-based deep learning methods.
Our model (SL-TSSI-DenseNet) overcomes the state-of-the-art of other skeleton-based models.
- Score: 2.179313476241343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sign Language Recognition (SLR) systems aim to be embedded in video stream
platforms to recognize the sign performed in front of a camera. SLR research
has taken advantage of recent advances in pose estimation models to use
skeleton sequences estimated from videos instead of RGB information to predict
signs. This approach can make HAR-related tasks less complex and more robust to
diverse backgrounds, lightning conditions, and physical appearances. In this
work, we explore the use of a spatio-temporal skeleton representation such as
Tree Structure Skeleton Image (TSSI) as an alternative input to improve the
accuracy of skeleton-based models for SLR. TSSI converts a skeleton sequence
into an RGB image where the columns represent the joints of the skeleton in a
depth-first tree traversal order, the rows represent the temporal evolution of
the joints, and the three channels represent the (x, y, z) coordinates of the
joints. We trained a DenseNet-121 using this type of input and compared it with
other skeleton-based deep learning methods using a large-scale American Sign
Language (ASL) dataset, WLASL. Our model (SL-TSSI-DenseNet) overcomes the
state-of-the-art of other skeleton-based models. Moreover, when including data
augmentation our proposal achieves better results than both skeleton-based and
RGB-based models. We evaluated the effectiveness of our model on the Ankara
University Turkish Sign Language (TSL) dataset, AUTSL, and a Mexican Sign
Language (LSM) dataset. On the AUTSL dataset, the model achieves similar
results to the state-of-the-art of other skeleton-based models. On the LSM
dataset, the model achieves higher results than the baseline. Code has been
made available at: https://github.com/davidlainesv/SL-TSSI-DenseNet.
Related papers
- SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences [2.0257616108612373]
We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset.
We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements.
The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation.
arXiv Detail & Related papers (2024-05-05T15:50:02Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - Learning Discriminative Representations for Skeleton Based Action
Recognition [49.45405879193866]
We propose an auxiliary feature refinement head (FR Head) to obtain discriminative representations of skeletons.
Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples.
arXiv Detail & Related papers (2023-03-07T08:37:48Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z) - Skeleton-based Action Recognition via Spatial and Temporal Transformer
Networks [12.06555892772049]
We propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator.
The proposed ST-TR achieves state-of-the-art performance on all datasets when using joints' coordinates as input, and results on-par with state-of-the-art when adding bones information.
arXiv Detail & Related papers (2020-08-17T15:25:40Z) - SkeletonNet: A Topology-Preserving Solution for Learning Mesh
Reconstruction of Object Surfaces from RGB Images [85.66560542483286]
This paper focuses on the challenging task of learning 3D object surface reconstructions from RGB images.
We propose two models, the Skeleton-Based GraphConvolutional Neural Network (SkeGCNN) and the Skeleton-Regularized Deep Implicit Surface Network (SkeDISN)
We conduct thorough experiments that verify the efficacy of our proposed SkeletonNet.
arXiv Detail & Related papers (2020-08-13T07:59:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.