Related papers: YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

URL: http://arxiv.org/abs/2306.15162v2
Date: Thu, 26 Oct 2023 22:57:49 GMT
Title: YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus
Authors: David Uthus, Garrett Tanzer, Manfred Georg
Abstract summary: We present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions from YouTube. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign. We achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.
Score: 2.5782420501870296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.

Related papers

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content. Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input. SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge [6.481946043182915]
We introduce the American Sign Language Knowledge Graph (ASLKG), compiled from twelve sources of expert linguistic knowledge. We use the ASLKG to train neuro-symbolic models for 3 ASL understanding tasks, achieving accuracies of 91% on ISR, 14% for predicting the semantic features of unseen signs, and 36% for classifying the topic of Youtube-ASL videos.
arXiv Detail & Related papers (2024-11-06T00:16:16Z)
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus [6.389882065284251]
We present YouTube-SL-25, a large-scale, open-domain multilingual corpus of sign language videos. With >3000 hours of videos across >25 sign languages, YouTube-SL-25 is the largest parallel sign language dataset to date.
arXiv Detail & Related papers (2024-07-15T18:08:34Z)
Towards Robust Speech Representation Learning for Thousands of Languages [77.2890285555615]
Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. We propose XEUS, a Cross-lingual for Universal Speech, trained on over 1 million hours of data across 4057 languages.
arXiv Detail & Related papers (2024-06-30T21:40:26Z)
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups. We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages. We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z)
SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language [0.0]
We release the first version of our ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours. This is the largest continuous sign language dataset published to date in terms of video duration.
arXiv Detail & Related papers (2022-10-13T07:08:00Z)
LAMDA-SSL: Semi-Supervised Learning in Python [56.14115592683035]
LAMDA-SSL is open-sourced on GitHub and its detailed usage documentation is available at https://ygzwqzd.github.io/LAMDA-SSL/. This documentation greatly reduces the cost of familiarizing users with LAMDA-SSL toolkit and SSL algorithms.
arXiv Detail & Related papers (2022-08-09T09:06:48Z)
Open-Domain Sign Language Translation Learned from Online Video [32.89182994277633]
We introduce OpenASL, a large-scale ASL-English dataset collected from online video sites. OpenASL contains 288 hours of ASL videos in various domains from over 200 signers. We propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features.
arXiv Detail & Related papers (2022-05-25T15:43:31Z)
Read and Attend: Temporal Localisation in Sign Language Videos [84.30262812057994]
We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens. We show that it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.
arXiv Detail & Related papers (2021-03-30T16:39:53Z)
Modeling Global Body Configurations in American Sign Language [2.8575516056239576]
American Sign Language (ASL) is the fourth most commonly used language in the United States. ASL is the language most commonly used by Deaf people in the United States and the English-speaking regions of Canada.
arXiv Detail & Related papers (2020-09-03T06:20:10Z)
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data. The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.