Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks
for Accurate Bangla Sign Language Recognition
- URL: http://arxiv.org/abs/2401.12210v1
- Date: Mon, 22 Jan 2024 18:52:51 GMT
- Title: Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks
for Accurate Bangla Sign Language Recognition
- Authors: Haz Sameen Shahgir, Khondker Salman Sayeed, Md Toki Tahmid, Tanjeem
Azwad Zaman, Md. Zarif Ul Alam
- Abstract summary: We present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words.
This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997).
The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature.
- Score: 2.624902795082451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Deep Learning and Computer Vision have been successfully
leveraged to serve marginalized communities in various contexts. One such area
is Sign Language - a primary means of communication for the deaf community.
However, so far, the bulk of research efforts and investments have gone into
American Sign Language, and research activity into low-resource sign languages
- especially Bangla Sign Language - has lagged significantly. In this research
paper, we present a new word-level Bangla Sign Language dataset - BdSL40 -
consisting of 611 videos over 40 words, along with two different approaches:
one with a 3D Convolutional Neural Network model and another with a novel Graph
Neural Network approach for the classification of BdSL40 dataset. This is the
first study on word-level BdSL recognition, and the dataset was transcribed
from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary
(1997). The proposed GNN model achieved an F1 score of 89%. The study
highlights the significant lexical and semantic similarity between BdSL, West
Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in
the literature. We release the dataset and source code to stimulate further
research.
Related papers
- SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information.
Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information.
We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z) - BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition [0.5497663232622964]
Sign language research is burgeoning to enhance communication with the deaf community.
One significant barrier has been the lack of a comprehensive Bangla sign language dataset.
We introduce a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size.
We devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers.
arXiv Detail & Related papers (2024-08-20T03:35:42Z) - iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing.
We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs.
We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z) - BdSLW60: A Word-Level Bangla Sign Language Dataset [3.8631510994883254]
We create a comprehensive BdSL word-level dataset named BdSLW60 in an unconstrained and natural setting.
The dataset encompasses 60 Bangla sign words, with a significant scale of 9307 video trials provided by 18 signers under the supervision of a sign language professional.
We report the benchmarking of our BdSLW60 dataset using the Support Vector Machine (SVM) with testing accuracy up to 67.6% and an attention-based bi-LSTM with testing accuracy up to 75.1%.
arXiv Detail & Related papers (2024-02-13T18:02:58Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - Bangla Natural Language Processing: A Comprehensive Review of Classical,
Machine Learning, and Deep Learning Based Methods [3.441093402715499]
The Bangla language is the seventh most spoken language, with 265 million native and non-native speakers worldwide.
English is the predominant language for online resources and technical knowledge, journals, and documentation.
Many efforts are also ongoing to make it easy to use the Bangla language in the online and technical domains.
arXiv Detail & Related papers (2021-05-31T10:58:58Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.