Related papers: Bukva: Russian Sign Language Alphabet

Bukva: Russian Sign Language Alphabet

URL: http://arxiv.org/abs/2410.08675v1
Date: Fri, 11 Oct 2024 09:59:48 GMT
Title: Bukva: Russian Sign Language Alphabet
Authors: Karina Kvanchiani, Petr Surovtsev, Alexander Nagaev, Elizaveta Petrova, Alexander Kapitanov,
Abstract summary: This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl. Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language. We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition.
Score: 75.42794328290088
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl. Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language. This method is used to spell words without specific signs, such as proper nouns or technical terms. The alphabet learning simulator is an essential isolated dactyl recognition application. There is a notable issue of data shortage in isolated dactyl recognition: existing Russian dactyl datasets lack subject heterogeneity, contain insufficient samples, or cover only static signs. We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition. It contains 3,757 videos with more than 101 samples for each RSL alphabet sign, including dynamic ones. We utilized crowdsourcing platforms to increase the subject's heterogeneity, resulting in the participation of 155 deaf and hard-of-hearing experts in the dataset creation. We use a TSM (Temporal Shift Module) block to handle static and dynamic signs effectively, achieving 83.6% top-1 accuracy with a real-time inference with CPU only. The dataset, demo code, and pre-trained models are publicly available.

Related papers

ISLR101: an Iranian Word-Level Sign Language Recognition Dataset [0.0]
ISLR101 is the first publicly available Iranian Sign Language dataset for isolated sign language recognition. This dataset includes 4,614 videos covering 101 distinct signs, recorded by 10 different signers against varied backgrounds, with a resolution of 800x600 pixels and a frame rate of 25 frames per second.
arXiv Detail & Related papers (2025-03-16T10:57:01Z)
Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies [6.403291706982091]
Isolated Sign Language Recognition is crucial for scalable language technology. We propose a one-shot learning approach that generalises across languages and evolving vocabularies. We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language.
arXiv Detail & Related papers (2025-02-27T15:07:51Z)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content. Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input. SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
SignCLIP: Connecting Text and Sign Language by Contrastive Learning [39.72545568965546]
SignCLIP is an efficient method of learning useful visual representations for sign language processing from large-scale, multilingual video-text pairs. We pretrain SignCLIP on Spreadthesign, a prominent sign language dictionary consisting of 500 thousand video clips in up to 44 sign languages. We analyze the latent space formed by the spoken language text and sign language poses, which provides additional linguistic insights.
arXiv Detail & Related papers (2024-07-01T13:17:35Z)
Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z)
ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition [6.296362537531586]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition dataset. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary.
arXiv Detail & Related papers (2023-04-12T15:52:53Z)
Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos. We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z)
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos [85.61513254261523]
Previous fingerspelling recognition methods have not focused on British Sign Language (BSL) In contrast to previous methods, our method only uses weak annotations from subtitles for training. We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities.
arXiv Detail & Related papers (2022-11-16T15:02:36Z)
Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate. Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.