FLEURS: Few-shot Learning Evaluation of Universal Representations of
Speech
- URL: http://arxiv.org/abs/2205.12446v1
- Date: Wed, 25 May 2022 02:29:03 GMT
- Title: FLEURS: Few-shot Learning Evaluation of Universal Representations of
Speech
- Authors: Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod,
Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
- Abstract summary: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark.
FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark.
- Score: 33.71744518887916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal
Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset
in 102 languages built on top of the machine translation FLoRes-101 benchmark,
with approximately 12 hours of speech supervision per language. FLEURS can be
used for a variety of speech tasks, including Automatic Speech Recognition
(ASR), Speech Language Identification (Speech LangID), Translation and
Retrieval. In this paper, we provide baselines for the tasks based on
multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable
speech technology in more languages and catalyze research in low-resource
speech understanding.
Related papers
- Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z) - Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training [29.47243668154796]
BLOOMZMMS is a novel model that integrates a multilingual LLM with a multilingual speech encoder.
We demonstrate the transferability of linguistic knowledge from the text to the speech modality.
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks.
arXiv Detail & Related papers (2024-04-16T21:45:59Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [76.95115818308918]
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.
This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages.
We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks.
arXiv Detail & Related papers (2023-03-02T07:47:18Z) - SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
Speech-to-Speech Translations [38.058120432870126]
SpeechMatrix is a large-scale multilingual corpus of speech-to-speech translations.
It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech.
arXiv Detail & Related papers (2022-11-08T19:09:27Z) - Multitask Training with Text Data for End-to-End Speech Recognition [45.35605825009208]
We propose a multitask training method for attention-based end-to-end speech recognition models.
We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data.
arXiv Detail & Related papers (2020-10-27T14:29:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.