XTREME-S: Evaluating Cross-lingual Speech Representations
- URL: http://arxiv.org/abs/2203.10752v2
- Date: Tue, 22 Mar 2022 10:10:19 GMT
- Title: XTREME-S: Evaluating Cross-lingual Speech Representations
- Authors: Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen,
Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan Van Esch,
Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli,
Sebastian Ruder, Jason Riesa, Melvin Johnson
- Abstract summary: XTREME-S is a new benchmark to evaluate universal cross-lingual speech representations in many languages.
This paper describes the new benchmark and establishes the first speech-only and speech-text baselines.
- Score: 88.78720838743772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual
speech representations in many languages. XTREME-S covers four task families:
speech recognition, classification, speech-to-text translation and retrieval.
Covering 102 languages from 10+ language families, 3 different domains and 4
task families, XTREME-S aims to simplify multilingual speech representation
evaluation, as well as catalyze research in "universal" speech representation
learning. This paper describes the new benchmark and establishes the first
speech-only and speech-text baselines using XLS-R and mSLAM on all downstream
tasks. We motivate the design choices and detail how to use the benchmark.
Datasets and fine-tuning scripts are made easily accessible at
https://hf.co/datasets/google/xtreme_s.
Related papers
- Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond [36.660499609887886]
Speech-MASSIVE is a multilingual Spoken Language Understanding dataset.
It covers 12 languages from different families and inherits from the annotations for the intent prediction and slot-filling tasks.
We demonstrate the suitability of Speech-MASSIVE for other tasks such as speech transcription, language identification, and speech translation.
arXiv Detail & Related papers (2024-08-07T16:55:28Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
Speech Representation [11.552745999302905]
We propose the SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation learning framework.
We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.
arXiv Detail & Related papers (2022-05-17T08:58:48Z) - XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation [93.80733419450225]
This paper analyzes the current state of cross-lingual transfer learning.
We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
arXiv Detail & Related papers (2021-04-15T12:26:12Z) - The Zero Resource Speech Benchmark 2021: Metrics and baselines for
unsupervised spoken language modeling [23.517751578968344]
We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels.
We present the results and analyses of a composite baseline made of self-supervised contrastive representation learning (CPC), clustering (k-means) and language modeling (LSTM or BERT)
This simple pipeline shows better than chance performance on all four metrics, demonstrating the feasibility of spoken language modeling from raw speech.
arXiv Detail & Related papers (2020-11-23T18:01:37Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.