Sentence-Level Sign Language Recognition Framework
- URL: http://arxiv.org/abs/2211.14447v1
- Date: Sun, 13 Nov 2022 01:45:41 GMT
- Title: Sentence-Level Sign Language Recognition Framework
- Authors: Atra Akandeh
- Abstract summary: Sentence-level SLR required mapping videos of sign language sentences to sequences of gloss labels.
CTC is used to avoid pre-segmenting the sentences into individual words.
We evaluate the performance of proposed models on RWTH-PHOENIX-Weather.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present two solutions to sentence-level SLR. Sentence-level SLR required
mapping videos of sign language sentences to sequences of gloss labels.
Connectionist Temporal Classification (CTC) has been used as the classifier
level of both models. CTC is used to avoid pre-segmenting the sentences into
individual words. The first model is an LRCN-based model, and the second model
is a Multi-Cue Network. LRCN is a model in which a CNN as a feature extractor
is applied to each frame before feeding them into an LSTM. In the first
approach, no prior knowledge has been leveraged. Raw frames are fed into an
18-layer LRCN with a CTC on top. In the second approach, three main
characteristics (hand shape, hand position, and hand movement information)
associated with each sign have been extracted using Mediapipe. 2D landmarks of
hand shape have been used to create the skeleton of the hands and then are fed
to a CONV-LSTM model. Hand locations and hand positions as relative distance to
head are fed to separate LSTMs. All three sources of information have been then
integrated into a Multi-Cue network with a CTC classification layer. We
evaluated the performance of proposed models on RWTH-PHOENIX-Weather. After
performing an excessive search on model hyper-parameters such as the number of
feature maps, input size, batch size, sequence length, LSTM memory cell,
regularization, and dropout, we were able to achieve 35 Word Error Rate (WER).
Related papers
- FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - A Variational Hierarchical Model for Neural Cross-Lingual Summarization [85.44969140204026]
Cross-lingual summarization () is to convert a document in one language to a summary in another one.
Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model.
We propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder.
arXiv Detail & Related papers (2022-03-08T02:46:11Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Rescoring Sequence-to-Sequence Models for Text Line Recognition with
CTC-Prefixes [0.0]
We propose to use the CTC-Prefix-Score during S2S decoding.
During beam search, paths that are invalid according to the CTC confidence matrix are penalised.
We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH.
arXiv Detail & Related papers (2021-10-12T11:40:05Z) - Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model.
A Transformer-based model along with a C3D model is used for hand detection and deep features extraction.
A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z) - Bidirectional LSTM-CRF Attention-based Model for Chinese Word
Segmentation [2.3991565023534087]
We propose a Bidirectional LSTM-CRF Attention-based Model for Chinese word segmentation.
Our model performs better than the baseline methods modeling by other neural networks.
arXiv Detail & Related papers (2021-05-20T11:46:53Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.