Related papers: ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign Language Recognition

ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign Language Recognition

URL: http://arxiv.org/abs/2210.03951v1
Date: Sat, 8 Oct 2022 07:36:20 GMT
Title: ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign Language Recognition
Authors: Hamzah Luqman
Abstract summary: ArabSign dataset consists of 9,335 samples performed by 6 signers. The total time of the recorded sentences is around 10 hours and the average sentence's length is 3.1 signs. We propose an encoder-decoder model for Continuous ArSL recognition.
Score: 1.2691047660244335
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sign language recognition has attracted the interest of researchers in recent years. While numerous approaches have been proposed for European and Asian sign languages recognition, very limited attempts have been made to develop similar systems for the Arabic sign language (ArSL). This can be attributed partly to the lack of a dataset at the sentence level. In this paper, we aim to make a significant contribution by proposing ArabSign, a continuous ArSL dataset. The proposed dataset consists of 9,335 samples performed by 6 signers. The total time of the recorded sentences is around 10 hours and the average sentence's length is 3.1 signs. ArabSign dataset was recorded using a Kinect V2 camera that provides three types of information (color, depth, and skeleton joint points) recorded simultaneously for each sentence. In addition, we provide the annotation of the dataset according to ArSL and Arabic language structures that can help in studying the linguistic characteristics of ArSL. To benchmark this dataset, we propose an encoder-decoder model for Continuous ArSL recognition. The model has been evaluated on the proposed dataset, and the obtained results show that the encoder-decoder model outperformed the attention mechanism with an average word error rate (WER) of 0.50 compared with 0.62 with the attention mechanism. The data and code are available at github.com/Hamzah-Luqman/ArabSign

Related papers

Developing Lightweight DNN Models With Limited Data For Real-Time Sign Language Recognition [0.0]
We present a novel framework for real-time sign language recognition using lightweight DNNs trained on limited data.<n>Our system addresses key challenges in sign language recognition, including data scarcity, high computational costs, and discrepancies in frame rates between training and inference environments.
arXiv Detail & Related papers (2025-06-30T20:34:54Z)
Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
ISLR101: an Iranian Word-Level Sign Language Recognition Dataset [0.0]
ISLR101 is the first publicly available Iranian Sign Language dataset for isolated sign language recognition. This dataset includes 4,614 videos covering 101 distinct signs, recorded by 10 different signers against varied backgrounds, with a resolution of 800x600 pixels and a frame rate of 25 frames per second.
arXiv Detail & Related papers (2025-03-16T10:57:01Z)
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content. Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input. SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software [0.0]
The dataset was created within the framework of a vision-based AzSL translation project. AzSLD contains 30,000 videos, each carefully annotated with accurate sign labels and corresponding linguistic translations.
arXiv Detail & Related papers (2024-11-19T21:15:47Z)
Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition [0.20075899678041528]
We introduce a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure. The dataset covers 2002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers. We propose a SL recognition model namely Hierarchical Windowed Graph Attention Network (HWGAT) by utilizing the human upper body skeleton graph.
arXiv Detail & Related papers (2024-07-19T11:48:36Z)
iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing. We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z)
Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z)
Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z)
ASL-Homework-RGBD Dataset: An annotated dataset of 45 fluent and non-fluent signers performing American Sign Language homeworks [32.3809065803553]
This dataset contains videos of fluent and non-fluent signers using American Sign Language (ASL) A total of 45 fluent and non-fluent participants were asked to perform signing homework assignments. The data is annotated to identify several aspects of signing including grammatical features and non-manual markers.
arXiv Detail & Related papers (2022-07-08T17:18:49Z)
BBC-Oxford British Sign Language Dataset [64.32108826673183]
We introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL) We describe the motivation for the dataset, together with statistics and available annotations. We conduct experiments to provide baselines for the tasks of sign recognition, sign language alignment, and sign language translation.
arXiv Detail & Related papers (2021-11-05T17:35:58Z)
Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate. Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language [37.578776156503906]
How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset. It consists of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was recorded in the Panoptic studio enabling detailed 3D pose estimation.
arXiv Detail & Related papers (2020-08-18T20:22:16Z)
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data. The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.