A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition
- URL: http://arxiv.org/abs/2407.09544v1
- Date: Thu, 27 Jun 2024 06:54:25 GMT
- Title: A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition
- Authors: Ali Ghadami, Alireza Taheri, Ali Meghdari,
- Abstract summary: This research aims to recognize Iranian Sign Language words with the help of the latest deep learning tools such as transformers.
The dataset used includes 101 Iranian Sign Language words frequently used in academic environments such as universities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sign language is an essential means of communication for millions of people around the world and serves as their primary language. However, most communication tools are developed for spoken and written languages which can cause problems and difficulties for the deaf and hard of hearing community. By developing a sign language recognition system, we can bridge this communication gap and enable people who use sign language as their main form of expression to better communicate with people and their surroundings. This recognition system increases the quality of health services, improves public services, and creates equal opportunities for the deaf community. This research aims to recognize Iranian Sign Language words with the help of the latest deep learning tools such as transformers. The dataset used includes 101 Iranian Sign Language words frequently used in academic environments such as universities. The network used is a combination of early fusion and late fusion transformer encoder-based networks optimized with the help of genetic algorithm. The selected features to train this network include hands and lips key points, and the distance and angle between hands extracted from the sign videos. Also, in addition to the training model for the classes, the embedding vectors of words are used as multi-task learning to have smoother and more efficient training. This model was also tested on sentences generated from our word dataset using a windowing technique for sentence translation. Finally, the sign language training software that provides real-time feedback to users with the help of the developed model, which has 90.2% accuracy on test data, was introduced, and in a survey, the effectiveness and efficiency of this type of sign language learning software and the impact of feedback were investigated.
Related papers
- EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate.
These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z) - A two-way translation system of Chinese sign language based on computer
vision [0.0]
TSM module is added to the lightweight neural network model for the large Chinese continuous sign language dataset.
We also improve the Bert-Base-Chinese model to divide Chinese sentences into words and mapping the natural word order to the statute sign language order.
Finally, we use the corresponding word videos to generate the sentence video, so as to achieve the function of text-to-sign language translation.
arXiv Detail & Related papers (2023-06-03T16:00:57Z) - A Comparative Analysis of Techniques and Algorithms for Recognising Sign
Language [0.9311364633437358]
Sign language is frequently used as the primary form of communication by people with hearing loss.
It is necessary to create human-computer interface systems that can offer hearing-impaired people a social platform.
Most commercial sign language translation systems are sensor-based, pricey, and challenging to use.
arXiv Detail & Related papers (2023-05-05T10:52:18Z) - Plug-and-Play Multilingual Few-shot Spoken Words Recognition [3.591566487849146]
We propose PLiX, a multilingual and plug-and-play keyword spotting system.
Our few-shot deep models are learned with millions of one-second audio clips across 20 languages.
We show that PLiX can generalize to novel spoken words given as few as just one support example.
arXiv Detail & Related papers (2023-05-03T18:58:14Z) - Gesture based Arabic Sign Language Recognition for Impaired People based
on Convolution Neural Network [0.0]
The recognition of Arabic Sign Language (ArSL) has become a difficult study subject due to variations in Arabic Sign Language (ArSL)
The proposed system takes Arabic sign language hand gestures as input and outputs vocalized speech as output.
The results were recognized by 90% of the people.
arXiv Detail & Related papers (2022-03-10T19:36:04Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Novel Approach to Use HU Moments with Image Processing Techniques for
Real Time Sign Language Communication [0.0]
"Sign Language Communicator" (SLC) is designed to solve the language barrier between the sign language users and the rest of the world.
System is able to recognize selected Sign Language signs with the accuracy of 84% without a controlled background with small light adjustments.
arXiv Detail & Related papers (2020-07-20T03:10:18Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.