Multi-View Spatial-Temporal Network for Continuous Sign Language
Recognition
- URL: http://arxiv.org/abs/2204.08747v1
- Date: Tue, 19 Apr 2022 08:43:03 GMT
- Title: Multi-View Spatial-Temporal Network for Continuous Sign Language
Recognition
- Authors: Ronghui Li and Lu Meng
- Abstract summary: This paper proposes a multi-view spatial-temporal continuous sign language recognition network.
It is tested on two public sign language datasets SLR-100 and PHOENIX-Weather 2014T (RWTH)
- Score: 0.76146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign language is a beautiful visual language and is also the primary language
used by speaking and hearing-impaired people. However, sign language has many
complex expressions, which are difficult for the public to understand and
master. Sign language recognition algorithms will significantly facilitate
communication between hearing-impaired people and normal people. Traditional
continuous sign language recognition often uses a sequence learning method
based on Convolutional Neural Network (CNN) and Long Short-Term Memory Network
(LSTM). These methods can only learn spatial and temporal features separately,
which cannot learn the complex spatial-temporal features of sign language. LSTM
is also difficult to learn long-term dependencies. To alleviate these problems,
this paper proposes a multi-view spatial-temporal continuous sign language
recognition network. The network consists of three parts. The first part is a
Multi-View Spatial-Temporal Feature Extractor Network (MSTN), which can
directly extract the spatial-temporal features of RGB and skeleton data; the
second is a sign language encoder network based on Transformer, which can learn
long-term dependencies; the third is a Connectionist Temporal Classification
(CTC) decoder network, which is used to predict the whole meaning of the
continuous sign language. Our algorithm is tested on two public sign language
datasets SLR-100 and PHOENIX-Weather 2014T (RWTH). As a result, our method
achieves excellent performance on both datasets. The word error rate on the
SLR-100 dataset is 1.9%, and the word error rate on the RWTHPHOENIX-Weather
dataset is 22.8%.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation [2.6311088262657907]
This work proposes an Isolated Sign Language Recognition (ISLR) approach where body, hands, and facial landmarks are extracted throughout time and encoded as 2-D images.
We show that our method surpassed the state-of-the-art in terms of performance metrics on two widely recognized datasets in Brazilian Sign Language (LIBRAS)
In addition to being more accurate, our method is more time-efficient and easier to train due to its reliance on a simpler network architecture and solely RGB data as input.
arXiv Detail & Related papers (2024-04-29T23:21:17Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive
Learning [38.83062453145388]
Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval.
We take into account the linguistic properties of both sign languages and natural languages, and simultaneously identify the fine-grained cross-lingual mappings.
Our framework outperforms the pioneering method by large margins on various datasets.
arXiv Detail & Related papers (2023-03-22T17:59:59Z) - Multi-scale temporal network for continuous sign language recognition [10.920363368754721]
Continuous Sign Language Recognition is a challenging research task due to the lack of accurate annotation on the temporal sequence of sign language data.
This paper proposes a multi-scale temporal network (MSTNet) to extract more accurate temporal features.
Experimental results on two publicly available datasets demonstrate that our method can effectively extract sign language features in an end-to-end manner without any prior knowledge.
arXiv Detail & Related papers (2022-04-08T06:14:22Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Application of Transfer Learning to Sign Language Recognition using an
Inflated 3D Deep Convolutional Neural Network [0.0]
Transfer learning is a technique to utilize a related task with an abundance of data available to help solve a target task lacking sufficient data.
This paper investigates how effectively transfer learning can be applied to isolated sign language recognition.
arXiv Detail & Related papers (2021-02-25T13:37:39Z) - TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
Sign Language Translation [101.6042317204022]
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
Existing SLT models usually represent sign visual features in a frame-wise manner.
We develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet.
arXiv Detail & Related papers (2020-10-12T05:58:09Z) - Global-local Enhancement Network for NMFs-aware Sign Language
Recognition [135.30357113518127]
We propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net)
Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues.
We introduce the first non-manual-features-aware isolated Chinese sign language dataset with a total vocabulary size of 1,067 sign words in daily life.
arXiv Detail & Related papers (2020-08-24T13:28:55Z) - Fully Convolutional Networks for Continuous Sign Language Recognition [83.85895472824221]
Continuous sign language recognition is a challenging task that requires learning on both spatial and temporal dimensions.
We propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences.
arXiv Detail & Related papers (2020-07-24T08:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.