BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition
- URL: http://arxiv.org/abs/2408.10518v1
- Date: Tue, 20 Aug 2024 03:35:42 GMT
- Title: BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition
- Authors: Md Hadiuzzaman, Mohammed Sowket Ali, Tamanna Sultana, Abdur Raj Shafi, Abu Saleh Musa Miah, Jungpil Shin,
- Abstract summary: Sign language research is burgeoning to enhance communication with the deaf community.
One significant barrier has been the lack of a comprehensive Bangla sign language dataset.
We introduce a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size.
We devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers.
- Score: 0.5497663232622964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: People commonly communicate in English, Arabic, and Bengali spoken languages through various mediums. However, deaf and hard-of-hearing individuals primarily use body language and sign language to express their needs and achieve independence. Sign language research is burgeoning to enhance communication with the deaf community. While many researchers have made strides in recognizing sign languages such as French, British, Arabic, Turkish, and American, there has been limited research on Bangla sign language (BdSL) with less-than-satisfactory results. One significant barrier has been the lack of a comprehensive Bangla sign language dataset. In our work, we introduced a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size. Our dataset encompasses 36 Bengali symbols, of which 30 are consonants and the remaining six are vowels. Despite our dataset contribution, many existing systems continue to grapple with achieving high-performance accuracy for BdSL. To address this, we devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers. Upon evaluating our hybrid-CNN model with the newly created BdSL dataset, we achieved an accuracy rate of 97.92\%. We are confident that both our BdSL dataset and hybrid CNN model will be recognized as significant milestones in BdSL research.
Related papers
- SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information.
Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information.
We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z) - SignSpeak: Open-Source Time Series Classification for ASL Translation [0.12499537119440243]
We propose a low-cost, real-time ASL-to-speech translation glove and an exhaustive training dataset of sign language patterns.
We benchmarked this dataset with supervised learning models, such as LSTMs, GRUs and Transformers, where our best model achieved 92% accuracy.
Our open-source dataset, models and glove designs provide an accurate and efficient ASL translator while maintaining cost-effectiveness.
arXiv Detail & Related papers (2024-06-27T17:58:54Z) - BdSLW60: A Word-Level Bangla Sign Language Dataset [3.8631510994883254]
We create a comprehensive BdSL word-level dataset named BdSLW60 in an unconstrained and natural setting.
The dataset encompasses 60 Bangla sign words, with a significant scale of 9307 video trials provided by 18 signers under the supervision of a sign language professional.
We report the benchmarking of our BdSLW60 dataset using the Support Vector Machine (SVM) with testing accuracy up to 67.6% and an attention-based bi-LSTM with testing accuracy up to 75.1%.
arXiv Detail & Related papers (2024-02-13T18:02:58Z) - Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks
for Accurate Bangla Sign Language Recognition [2.624902795082451]
We present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words.
This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997).
The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature.
arXiv Detail & Related papers (2024-01-22T18:52:51Z) - Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities.
This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z) - Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language.
Our pipeline consists of three components: acoustic, pronunciation, and language models.
We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z) - BdSL36: A Dataset for Bangladeshi Sign Letters Recognition [4.010701467679244]
Bangladeshi Sign Language (BdSL) is a commonly used medium of communication for the hearing-impaired people in Bangladesh.
In this paper, we introduce a dataset named BdSL36 which incorporates background augmentation to make the dataset versatile.
Besides, we annotate about 40,000 images with bounding boxes to utilize the potentiality of object detection algorithms.
arXiv Detail & Related papers (2021-10-02T19:52:48Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Modeling Global Body Configurations in American Sign Language [2.8575516056239576]
American Sign Language (ASL) is the fourth most commonly used language in the United States.
ASL is the language most commonly used by Deaf people in the United States and the English-speaking regions of Canada.
arXiv Detail & Related papers (2020-09-03T06:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.