Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies
- URL: http://arxiv.org/abs/2502.20171v1
- Date: Thu, 27 Feb 2025 15:07:51 GMT
- Title: Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies
- Authors: Toon Vandendriessche, Mathieu De Coster, Annelies Lejon, Joni Dambre,
- Abstract summary: Isolated Sign Language Recognition is crucial for scalable language technology.<n>We propose a one-shot learning approach that generalises across languages and evolving vocabularies.<n>We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language.
- Score: 6.403291706982091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Isolated Sign Language Recognition (ISLR) is crucial for scalable sign language technology, yet language-specific approaches limit current models. To address this, we propose a one-shot learning approach that generalises across languages and evolving vocabularies. Our method involves pretraining a model to embed signs based on essential features and using a dense vector search for rapid, accurate recognition of unseen signs. We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language than the training set. Our approach is robust across languages and support sets, offering a scalable, adaptable solution for ISLR. Co-created with the Deaf and Hard of Hearing (DHH) community, this method aligns with real-world needs, and advances scalable sign language recognition.
Related papers
- ISLR101: an Iranian Word-Level Sign Language Recognition Dataset [0.0]
ISLR101 is the first publicly available Iranian Sign Language dataset for isolated sign language recognition.
This dataset includes 4,614 videos covering 101 distinct signs, recorded by 10 different signers against varied backgrounds, with a resolution of 800x600 pixels and a frame rate of 25 frames per second.
arXiv Detail & Related papers (2025-03-16T10:57:01Z) - Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively.<n>However, Whisper struggles with unseen languages, those not included in its pre-training.<n>We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z) - Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs.
We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z) - SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content.
Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input.
SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - Sign Languague Recognition without frame-sequencing constraints: A proof
of concept on the Argentinian Sign Language [42.27617228521691]
This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features.
The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset.
arXiv Detail & Related papers (2023-10-26T14:47:11Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Learnt Contrastive Concept Embeddings for Sign Recognition [33.72708697077754]
We focus on explicitly creating sign embeddings that bridge the gap between sign language and spoken language.
We train a vocabulary of embeddings that are based on the linguistic labels for sign video.
We develop a conceptual similarity loss which is able to utilise word embeddings from NLP methods to create sign embeddings that have better sign language to spoken language correspondence.
arXiv Detail & Related papers (2023-08-18T12:47:18Z) - Improving Sign Recognition with Phonology [8.27285154257448]
We use insights from research on American Sign Language phonology to train models for isolated sign language recognition.
We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics.
These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark.
arXiv Detail & Related papers (2023-02-11T18:51:23Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Global-local Enhancement Network for NMFs-aware Sign Language
Recognition [135.30357113518127]
We propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net)
Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues.
We introduce the first non-manual-features-aware isolated Chinese sign language dataset with a total vocabulary size of 1,067 sign words in daily life.
arXiv Detail & Related papers (2020-08-24T13:28:55Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.