PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling
- URL: http://arxiv.org/abs/2406.16388v1
- Date: Mon, 24 Jun 2024 07:59:34 GMT
- Title: PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling
- Authors: Amirparsa Salmankhah, Amirreza Rajabi, Negin Kheirmand, Ali Fadaeimanesh, Amirreza Tarabkhah, Amirreza Kazemzadeh, Hamed Farbeh,
- Abstract summary: Pen SLR is a glove-based sign language system consisting of an Inertial Measurement Unit (IMU) and five flexible sensors powered by a deep learning framework.
We propose a novel ensembling technique by leveraging a multiple sequence alignment algorithm known as Star Alignment.
Our evaluations show that Pen SLR achieves a remarkable word accuracy of 94.58% and 96.70% in subject-independent and subject-dependent setups.
- Score: 0.953605234706973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sign Language Recognition (SLR) is a fast-growing field that aims to fill the communication gaps between the hearing-impaired and people without hearing loss. Existing solutions for Persian Sign Language (PSL) are limited to word-level interpretations, underscoring the need for more advanced and comprehensive solutions. Moreover, previous work on other languages mainly focuses on manipulating the neural network architectures or hardware configurations instead of benefiting from the aggregated results of multiple models. In this paper, we introduce PenSLR, a glove-based sign language system consisting of an Inertial Measurement Unit (IMU) and five flexible sensors powered by a deep learning framework capable of predicting variable-length sequences. We achieve this in an end-to-end manner by leveraging the Connectionist Temporal Classification (CTC) loss function, eliminating the need for segmentation of input signals. To further enhance its capabilities, we propose a novel ensembling technique by leveraging a multiple sequence alignment algorithm known as Star Alignment. Furthermore, we introduce a new PSL dataset, including 16 PSL signs with more than 3000 time-series samples in total. We utilize this dataset to evaluate the performance of our system based on four word-level and sentence-level metrics. Our evaluations show that PenSLR achieves a remarkable word accuracy of 94.58% and 96.70% in subject-independent and subject-dependent setups, respectively. These achievements are attributable to our ensembling algorithm, which not only boosts the word-level performance by 0.51% and 1.32% in the respective scenarios but also yields significant enhancements of 1.46% and 4.00%, respectively, in sentence-level accuracy.
Related papers
- Sign language recognition based on deep learning and low-cost handcrafted descriptors [0.0]
It is important to consider as many linguistic parameters as possible in gesture execution to avoid ambiguity between words.
It is essential to ensure that the chosen technology is realistic, avoiding expensive, intrusive, or low-mobility sensors.
We propose an efficient sign language recognition system that utilizes low-cost sensors and techniques.
arXiv Detail & Related papers (2024-08-14T00:56:51Z) - Chain of Stance: Stance Detection with Large Language Models [3.528201746844624]
Stance detection is an active task in natural language processing (NLP)
We propose a new prompting method, called textitChain of Stance (CoS)
arXiv Detail & Related papers (2024-08-03T16:30:51Z) - Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation [3.976851945232775]
Current approaches for sign language recognition rely on RGB video inputs, which are vulnerable to fluctuations in the background.
We propose a multi-stream keypoint attention network to depict a sequence of keypoints produced by a readily available keypoint estimator.
We carry out comprehensive experiments on well-known benchmarks like Phoenix-2014, Phoenix-2014T, and CSL-Daily to showcase the efficacy of our methodology.
arXiv Detail & Related papers (2024-05-09T10:58:37Z) - Sign Language Recognition based on YOLOv5 Algorithm for the Telugu Sign Language [0.0]
This paper presents a novel approach for identifying gestures in TSL using the YOLOv5 object identification framework.
A deep learning model was created that used the YOLOv5 to recognize and classify gestures.
The system's stability and generalizability across various TSL gestures and settings were evaluated through rigorous testing and validation.
arXiv Detail & Related papers (2024-04-24T18:39:27Z) - Sign Language Conversation Interpretation Using Wearable Sensors and
Machine Learning [0.0]
The count of people suffering from various levels of hearing loss reached 1.57 billion in 2019.
This paper presents a proof of concept of an automatic sign language recognition system based on data obtained using a wearable device of 3 flex sensors.
arXiv Detail & Related papers (2023-12-19T07:06:32Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - FewJoint: A Few-shot Learning Benchmark for Joint Language Understanding [55.38905499274026]
Few-shot learning is one of the key future steps in machine learning.
FewJoint is a novel Few-Shot Learning benchmark for NLP.
arXiv Detail & Related papers (2020-09-17T08:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.