SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous
American Sign Language
- URL: http://arxiv.org/abs/2210.06791v1
- Date: Thu, 13 Oct 2022 07:08:00 GMT
- Title: SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous
American Sign Language
- Authors: Yehong Jiang
- Abstract summary: We release the first version of our ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours.
This is the largest continuous sign language dataset published to date in terms of video duration.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite tremendous progress in natural language processing using deep
learning techniques in recent years, sign language production and comprehension
has advanced very little. One critical barrier is the lack of largescale
datasets available to the public due to the unbearable cost of labeled data
generation. Efforts to provide public data for American Sign Language (ASL)
comprehension have yielded two datasets, comprising more than thousand video
clips. These datasets are large enough to enable a meaningful start to deep
learning research on sign languages but are far too small to lead to any
solution that can be practically deployed. So far, there is still no suitable
dataset for ASL production. We proposed a system that can generate large scale
ASL datasets for continuous ASL. It is suitable for general ASL processing and
is particularly useful for ASL production. The continuous ASL dataset contains
English labeled human articulations in condensed body pose data formats. To
better serve the research community, we are releasing the first version of our
ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k
words, in a total of 104 hours. This is the largest continuous sign language
dataset published to date in terms of video duration. We also describe a system
that can evolve and expand the dataset to incorporate better data processing
techniques and more contents when available. It is our hope that the release of
this ASL dataset and the sustainable dataset generation system to the public
will propel better deep-learning research in ASL natural language processing.
Related papers
- BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition [0.5497663232622964]
Sign language research is burgeoning to enhance communication with the deaf community.
One significant barrier has been the lack of a comprehensive Bangla sign language dataset.
We introduce a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size.
We devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers.
arXiv Detail & Related papers (2024-08-20T03:35:42Z) - iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing.
We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs.
We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z) - Towards Robust Speech Representation Learning for Thousands of Languages [77.2890285555615]
Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data.
We propose XEUS, a Cross-lingual for Universal Speech, trained on over 1 million hours of data across 4057 languages.
arXiv Detail & Related papers (2024-06-30T21:40:26Z) - Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends [82.64268080902742]
Self-supervised learning (SSL) aims to learn discriminative features from unlabeled data without relying on human-annotated labels.
SSL has garnered significant attention recently, leading to the development of numerous related algorithms.
This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions.
arXiv Detail & Related papers (2023-01-13T14:41:05Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - ASL-Homework-RGBD Dataset: An annotated dataset of 45 fluent and
non-fluent signers performing American Sign Language homeworks [32.3809065803553]
This dataset contains videos of fluent and non-fluent signers using American Sign Language (ASL)
A total of 45 fluent and non-fluent participants were asked to perform signing homework assignments.
The data is annotated to identify several aspects of signing including grammatical features and non-manual markers.
arXiv Detail & Related papers (2022-07-08T17:18:49Z) - Open-Domain Sign Language Translation Learned from Online Video [32.89182994277633]
We introduce OpenASL, a large-scale ASL-English dataset collected from online video sites.
OpenASL contains 288 hours of ASL videos in various domains from over 200 signers.
We propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features.
arXiv Detail & Related papers (2022-05-25T15:43:31Z) - BBC-Oxford British Sign Language Dataset [64.32108826673183]
We introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL)
We describe the motivation for the dataset, together with statistics and available annotations.
We conduct experiments to provide baselines for the tasks of sign recognition, sign language alignment, and sign language translation.
arXiv Detail & Related papers (2021-11-05T17:35:58Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign
Language [37.578776156503906]
How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset.
It consists of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.
A three-hour subset was recorded in the Panoptic studio enabling detailed 3D pose estimation.
arXiv Detail & Related papers (2020-08-18T20:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.