The Sem-Lex Benchmark: Modeling ASL Signs and Their Phonemes
- URL: http://arxiv.org/abs/2310.00196v1
- Date: Sat, 30 Sep 2023 00:25:43 GMT
- Title: The Sem-Lex Benchmark: Modeling ASL Signs and Their Phonemes
- Authors: Lee Kezar, Elana Pontecorvo, Adele Daniels, Connor Baer, Ruth Ferster,
Lauren Berger, Jesse Thomason, Zed Sevcikova Sehyr, Naomi Caselli
- Abstract summary: We introduce a new resource for American Sign Language (ASL) modeling, the Sem-Lex Benchmark.
The Benchmark is the current largest of its kind, consisting of over 84k videos of isolated sign productions from deaf ASL signers who gave informed consent and received compensation.
We present a suite of experiments which make use of the linguistic information in ASL-LEX, evaluating the practicality and fairness of the Sem-Lex Benchmark for isolated sign recognition (ISR)
- Score: 6.0179345110920455
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sign language recognition and translation technologies have the potential to
increase access and inclusion of deaf signing communities, but research
progress is bottlenecked by a lack of representative data. We introduce a new
resource for American Sign Language (ASL) modeling, the Sem-Lex Benchmark. The
Benchmark is the current largest of its kind, consisting of over 84k videos of
isolated sign productions from deaf ASL signers who gave informed consent and
received compensation. Human experts aligned these videos with other sign
language resources including ASL-LEX, SignBank, and ASL Citizen, enabling
useful expansions for sign and phonological feature recognition. We present a
suite of experiments which make use of the linguistic information in ASL-LEX,
evaluating the practicality and fairness of the Sem-Lex Benchmark for isolated
sign recognition (ISR). We use an SL-GCN model to show that the phonological
features are recognizable with 85% accuracy, and that they are effective as an
auxiliary target to ISR. Learning to recognize phonological features alongside
gloss results in a 6% improvement for few-shot ISR accuracy and a 2%
improvement for ISR accuracy overall. Instructions for downloading the data can
be found at https://github.com/leekezar/SemLex.
Related papers
- SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content.
Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input.
SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge [6.481946043182915]
We introduce the American Sign Language Knowledge Graph (ASLKG), compiled from twelve sources of expert linguistic knowledge.
We use the ASLKG to train neuro-symbolic models for 3 ASL understanding tasks, achieving accuracies of 91% on ISR, 14% for predicting the semantic features of unseen signs, and 36% for classifying the topic of Youtube-ASL videos.
arXiv Detail & Related papers (2024-11-06T00:16:16Z) - ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign
Language Recognition [6.296362537531586]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide.
To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition dataset.
We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary.
arXiv Detail & Related papers (2023-04-12T15:52:53Z) - Improving Sign Recognition with Phonology [8.27285154257448]
We use insights from research on American Sign Language phonology to train models for isolated sign language recognition.
We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics.
These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark.
arXiv Detail & Related papers (2023-02-11T18:51:23Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing.
Recent works also investigated SSL from speech.
We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.