3D-LEX v1.0: 3D Lexicons for American Sign Language and Sign Language of the Netherlands
- URL: http://arxiv.org/abs/2409.01901v1
- Date: Tue, 3 Sep 2024 13:44:56 GMT
- Title: 3D-LEX v1.0: 3D Lexicons for American Sign Language and Sign Language of the Netherlands
- Authors: Oline Ranum, Gomer Otterspeer, Jari I. Andersen, Robert G. Belleman, Floris Roelofsen,
- Abstract summary: We present an efficient approach for capturing sign language in 3D, introduce the 3D-LEX dataset, and detail a method for semi-automatic annotation of phonetic properties.
Our procedure integrates three motion capture techniques encompassing high-resolution 3D poses, 3D handshapes, and depth-aware facial features.
The 3D-LEX dataset includes 1,000 signs from American Sign Language and an additional 1,000 signs from the Sign Language of the Netherlands.
- Score: 1.8641315013048299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present an efficient approach for capturing sign language in 3D, introduce the 3D-LEX v1.0 dataset, and detail a method for semi-automatic annotation of phonetic properties. Our procedure integrates three motion capture techniques encompassing high-resolution 3D poses, 3D handshapes, and depth-aware facial features, and attains an average sampling rate of one sign every 10 seconds. This includes the time for presenting a sign example, performing and recording the sign, and archiving the capture. The 3D-LEX dataset includes 1,000 signs from American Sign Language and an additional 1,000 signs from the Sign Language of the Netherlands. We showcase the dataset utility by presenting a simple method for generating handshape annotations directly from 3D-LEX. We produce handshape labels for 1,000 signs from American Sign Language and evaluate the labels in a sign recognition task. The labels enhance gloss recognition accuracy by 5% over using no handshape annotations, and by 1% over expert annotations. Our motion capture data supports in-depth analysis of sign features and facilitates the generation of 2D projections from any viewpoint. The 3D-LEX collection has been aligned with existing sign language benchmarks and linguistic resources, to support studies in 3D-aware sign language processing.
Related papers
- SignAvatar: Sign Language 3D Motion Reconstruction and Generation [10.342253593687781]
SignAvatar is a framework capable of both word-level sign language reconstruction and generation.
We contribute the ASL3DWord dataset, composed of 3D joint rotation data for the body, hands, and face.
arXiv Detail & Related papers (2024-05-13T17:48:22Z) - A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars [49.60328609426056]
Spoken2Sign is a system for translating spoken languages into sign languages.
We present a simple baseline consisting of three steps: creating a gloss-video dictionary, estimating a 3D sign for each sign video, and training a Spoken2Sign model.
As far as we know, we are the first to present the Spoken2Sign task in an output format of 3D signs.
arXiv Detail & Related papers (2024-01-09T18:59:49Z) - SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark [20.11364909443987]
SignAvatars is the first large-scale, multi-prompt 3D sign language (SL) motion dataset designed to bridge the communication gap for Deaf and hard-of-hearing individuals.
The dataset comprises 70,000 videos from 153 signers, totaling 8.34 million frames, covering both isolated signs and continuous, co-articulated signs.
arXiv Detail & Related papers (2023-10-31T13:15:49Z) - Scalable 3D Captioning with Pretrained Models [63.16604472745202]
Cap3D is an automatic approach for generating descriptive text for 3D objects.
We apply Cap3D to the recently introduced large-scale 3D dataset.
Our evaluation, conducted using 41k human annotations from the same dataset, demonstrates that Cap3D surpasses human descriptions in terms of quality, cost, and speed.
arXiv Detail & Related papers (2023-06-12T17:59:03Z) - Reconstructing Signing Avatars From Video Using Linguistic Priors [54.5282429129769]
Sign language (SL) is the primary method of communication for the 70 million Deaf people around the world.
replacing video dictionaries of isolated signs with 3D avatars can aid learning and enable AR/VR applications.
SGNify captures fine-grained hand pose, facial expression, and body movement fully automatically from in-the-wild monocular SL videos.
arXiv Detail & Related papers (2023-04-20T17:29:50Z) - PLA: Language-Driven Open-Vocabulary 3D Scene Understanding [57.47315482494805]
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.
Recent breakthrough of 2D open-vocabulary perception is driven by Internet-scale paired image-text data with rich vocabulary concepts.
We propose to distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning multi-view images from 3D.
arXiv Detail & Related papers (2022-11-29T15:52:22Z) - Read and Attend: Temporal Localisation in Sign Language Videos [84.30262812057994]
We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens.
We show that it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.
arXiv Detail & Related papers (2021-03-30T16:39:53Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.