Context Matters: Self-Attention for Sign Language Recognition
- URL: http://arxiv.org/abs/2101.04632v1
- Date: Tue, 12 Jan 2021 17:40:19 GMT
- Title: Context Matters: Self-Attention for Sign Language Recognition
- Authors: Fares Ben Slimane and Mohamed Bouguessa
- Abstract summary: This paper proposes an attentional network for the task of Continuous Sign Language Recognition.
We exploit co-independent streams of data to model the sign language modalities.
We find that the model is able to identify the essential Sign Language components that revolve around the dominant hand and the face areas.
- Score: 1.005130974691351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes an attentional network for the task of Continuous Sign
Language Recognition. The proposed approach exploits co-independent streams of
data to model the sign language modalities. These different channels of
information can share a complex temporal structure between each other. For that
reason, we apply attention to synchronize and help capture entangled
dependencies between the different sign language components. Even though Sign
Language is multi-channel, handshapes represent the central entities in sign
interpretation. Seeing handshapes in their correct context defines the meaning
of a sign. Taking that into account, we utilize the attention mechanism to
efficiently aggregate the hand features with their appropriate spatio-temporal
context for better sign recognition. We found that by doing so the model is
able to identify the essential Sign Language components that revolve around the
dominant hand and the face areas. We test our model on the benchmark dataset
RWTH-PHOENIX-Weather 2014, yielding competitive results.
Related papers
- MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Classification of Phonological Parameters in Sign Languages [0.0]
Linguistic research often breaks down signs into constituent parts to study sign languages.
We show how a single model can be used to recognise the individual phonological parameters within sign languages.
arXiv Detail & Related papers (2022-05-24T13:40:45Z) - Pose-based Sign Language Recognition using GCN and BERT [0.0]
Word-level sign language recognition (WSLR) is the first important step towards understanding and interpreting sign language.
recognizing signs from videos is a challenging task as the meaning of a word depends on a combination of subtle body motions, hand configurations, and other movements.
Recent pose-based architectures for W SLR either model both the spatial and temporal dependencies among the poses in different frames simultaneously or only model the temporal information without fully utilizing the spatial information.
We tackle the problem of W SLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion.
arXiv Detail & Related papers (2020-12-01T19:10:50Z) - TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
Sign Language Translation [101.6042317204022]
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
Existing SLT models usually represent sign visual features in a frame-wise manner.
We develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet.
arXiv Detail & Related papers (2020-10-12T05:58:09Z) - Global-local Enhancement Network for NMFs-aware Sign Language
Recognition [135.30357113518127]
We propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net)
Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues.
We introduce the first non-manual-features-aware isolated Chinese sign language dataset with a total vocabulary size of 1,067 sign words in daily life.
arXiv Detail & Related papers (2020-08-24T13:28:55Z) - Temporal Accumulative Features for Sign Language Recognition [2.3204178451683264]
We have devised an efficient and fast SLR method for recognizing isolated sign language gestures.
We also incorporate hand shape information and using a small scale sequential neural network, demonstrate that modeling of accumulative features for linguistic subunits improves upon baseline classification results.
arXiv Detail & Related papers (2020-04-02T19:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.