Open-Domain Sign Language Translation Learned from Online Video
- URL: http://arxiv.org/abs/2205.12870v1
- Date: Wed, 25 May 2022 15:43:31 GMT
- Title: Open-Domain Sign Language Translation Learned from Online Video
- Authors: Bowen Shi and Diane Brentari and Greg Shakhnarovich and Karen Livescu
- Abstract summary: We introduce OpenASL, a large-scale ASL-English dataset collected from online video sites.
OpenASL contains 288 hours of ASL videos in various domains from over 200 signers.
We propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features.
- Score: 32.89182994277633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing work on sign language translation--that is, translation from sign
language videos into sentences in a written language--has focused mainly on (1)
data collected in a controlled environment or (2) data in a specific domain,
which limits the applicability to real-world settings. In this paper, we
introduce OpenASL, a large-scale ASL-English dataset collected from online
video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in
various domains (news, VLOGs, etc.) from over 200 signers and is the largest
publicly available ASL translation dataset to date. To tackle the challenges of
sign language translation in realistic settings and without glosses, we propose
a set of techniques including sign search as a pretext task for pre-training
and fusion of mouthing and handshape features. The proposed techniques produce
consistent and large improvements in translation quality, over baseline models
based on prior work. Our data, code and model will be publicly available at
https://github.com/chevalierNoir/OpenASL
Related papers
- FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation [0.9790236766474201]
We introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech)
FLEURS-ASL can be used to evaluate a variety of tasks between ASL and 200 other languages as text, or 102 languages as speech.
We provide baselines for tasks from ASL to English text using a unified modeling approach that incorporates timestamp tokens and previous text tokens in a 34-second context window.
We also use FLEURS-ASL to show that multimodal frontier models have virtually no understanding of ASL, underscoring the importance of including sign languages in
arXiv Detail & Related papers (2024-08-24T13:59:41Z) - iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing.
We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs.
We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z) - YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English
Parallel Corpus [2.5782420501870296]
We present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions from YouTube.
We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign.
We achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.
arXiv Detail & Related papers (2023-06-27T02:44:07Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous
American Sign Language [0.0]
We release the first version of our ASL dataset, which contains 30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours.
This is the largest continuous sign language dataset published to date in terms of video duration.
arXiv Detail & Related papers (2022-10-13T07:08:00Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation.
We propose SimulSLT, the first end-to-end simultaneous sign language translation model.
SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z) - BBC-Oxford British Sign Language Dataset [64.32108826673183]
We introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL)
We describe the motivation for the dataset, together with statistics and available annotations.
We conduct experiments to provide baselines for the tasks of sign recognition, sign language alignment, and sign language translation.
arXiv Detail & Related papers (2021-11-05T17:35:58Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.