Related papers: Slovo: Russian Sign Language Dataset

Slovo: Russian Sign Language Dataset

URL: http://arxiv.org/abs/2305.14527v3
Date: Tue, 12 Mar 2024 14:45:02 GMT
Title: Slovo: Russian Sign Language Dataset
Authors: Alexander Kapitanov, Karina Kvanchiani, Alexander Nagaev, Elizaveta Petrova
Abstract summary: This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
Score: 83.93252084624997
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: One of the main challenges of the sign language recognition task is the difficulty of collecting a suitable dataset due to the gap between hard-of-hearing and hearing societies. In addition, the sign language in each country differs significantly, which obliges the creation of new data for each of them. This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers. We also provide the entire dataset creation pipeline, from data collection to video annotation, with the following demo application. Several neural networks are trained and evaluated on the Slovo to demonstrate its teaching ability. Proposed data and pre-trained models are publicly available.

Related papers

Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
ISLR101: an Iranian Word-Level Sign Language Recognition Dataset [0.0]
ISLR101 is the first publicly available Iranian Sign Language dataset for isolated sign language recognition. This dataset includes 4,614 videos covering 101 distinct signs, recorded by 10 different signers against varied backgrounds, with a resolution of 800x600 pixels and a frame rate of 25 frames per second.
arXiv Detail & Related papers (2025-03-16T10:57:01Z)
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content. Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input. SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software [0.0]
The dataset was created within the framework of a vision-based AzSL translation project. AzSLD contains 30,000 videos, each carefully annotated with accurate sign labels and corresponding linguistic translations.
arXiv Detail & Related papers (2024-11-19T21:15:47Z)
Bukva: Russian Sign Language Alphabet [75.42794328290088]
This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl. Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language. We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition.
arXiv Detail & Related papers (2024-10-11T09:59:48Z)
ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition [6.296362537531586]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition dataset. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary.
arXiv Detail & Related papers (2023-04-12T15:52:53Z)
Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos. We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z)
LSA-T: The first continuous Argentinian Sign Language dataset for Sign Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning. This paper presents the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z)
ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language. Our pipeline consists of three components: acoustic, pronunciation, and language models. We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z)
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language [37.578776156503906]
How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset. It consists of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was recorded in the Panoptic studio enabling detailed 3D pose estimation.
arXiv Detail & Related papers (2020-08-18T20:22:16Z)
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data. The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.