YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
- URL: http://arxiv.org/abs/2407.11144v1
- Date: Mon, 15 Jul 2024 18:08:34 GMT
- Title: YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
- Authors: Garrett Tanzer, Biao Zhang,
- Abstract summary: We present YouTube-SL-25, a large-scale, open-domain multilingual corpus of sign language videos.
With >3000 hours of videos across >25 sign languages, YouTube-SL-25 is the largest parallel sign language dataset to date.
- Score: 6.389882065284251
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even for better-studied sign languages like American Sign Language (ASL), data is the bottleneck for machine learning research. The situation is worse yet for the many other sign languages used by Deaf/Hard of Hearing communities around the world. In this paper, we present YouTube-SL-25, a large-scale, open-domain multilingual corpus of sign language videos with seemingly well-aligned captions drawn from YouTube. With >3000 hours of videos across >25 sign languages, YouTube-SL-25 is a) >3x the size of YouTube-ASL, b) the largest parallel sign language dataset to date, and c) the first or largest parallel dataset for many of its component languages. We provide baselines for sign-to-text tasks using a unified multilingual multitask model based on T5 and report scores on benchmarks across 4 sign languages. The results demonstrate that multilingual transfer benefits both higher- and lower-resource sign languages within YouTube-SL-25.
Related papers
- FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation [0.9790236766474201]
We introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech)
FLEURS-ASL can be used to evaluate a variety of tasks between ASL and 200 other languages as text, or 102 languages as speech.
We provide baselines for tasks from ASL to English text using a unified modeling approach that incorporates timestamp tokens and previous text tokens in a 34-second context window.
We also use FLEURS-ASL to show that multimodal frontier models have virtually no understanding of ASL, underscoring the importance of including sign languages in
arXiv Detail & Related papers (2024-08-24T13:59:41Z) - iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing.
We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs.
We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling [70.34758460372629]
We introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages.
MYTE produces shorter encodings for all 99 analyzed languages.
This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.
arXiv Detail & Related papers (2024-03-15T21:21:11Z) - Zero-shot Sentiment Analysis in Low-Resource Languages Using a
Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets.
We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z) - YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English
Parallel Corpus [2.5782420501870296]
We present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions from YouTube.
We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign.
We achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.
arXiv Detail & Related papers (2023-06-27T02:44:07Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Open-Domain Sign Language Translation Learned from Online Video [32.89182994277633]
We introduce OpenASL, a large-scale ASL-English dataset collected from online video sites.
OpenASL contains 288 hours of ASL videos in various domains from over 200 signers.
We propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features.
arXiv Detail & Related papers (2022-05-25T15:43:31Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign
Language [37.578776156503906]
How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset.
It consists of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.
A three-hour subset was recorded in the Panoptic studio enabling detailed 3D pose estimation.
arXiv Detail & Related papers (2020-08-18T20:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.