The Multilingual TEDx Corpus for Speech Recognition and Translation
- URL: http://arxiv.org/abs/2102.01757v1
- Date: Tue, 2 Feb 2021 21:16:25 GMT
- Title: The Multilingual TEDx Corpus for Speech Recognition and Translation
- Authors: Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni,
Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post
- Abstract summary: We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.
The corpus is a collection of audio recordings from TEDx talks in 8 source languages.
We segment transcripts into sentences and align them to the source-language audio and target-language translations.
- Score: 30.993199499048824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Multilingual TEDx corpus, built to support speech recognition
(ASR) and speech translation (ST) research across many non-English source
languages. The corpus is a collection of audio recordings from TEDx talks in 8
source languages. We segment transcripts into sentences and align them to the
source-language audio and target-language translations. The corpus is released
along with open-sourced code enabling extension to new talks and languages as
they become available. Our corpus creation methodology can be applied to more
languages than previous work, and creates multi-way parallel evaluation sets.
We provide baselines in multiple ASR and ST settings, including multilingual
models to improve translation performance for low-resource language pairs.
Related papers
- Cross-Lingual Transfer Learning for Speech Translation [7.802021866251242]
This paper examines how to expand the speech translation capability of speech foundation models with restricted data.
Whisper, a speech foundation model with strong performance on speech recognition and English translation, is used as the example model.
Using speech-to-speech retrieval to analyse the audio representations generated by the encoder, we show that utterances from different languages are mapped to a shared semantic space.
arXiv Detail & Related papers (2024-07-01T09:51:48Z) - Fine-Tuned Self-Supervised Speech Representations for Language
Diarization in Multilingual Code-Switched Speech [4.39549503760707]
We develop a continuous multilingual language diarizer using fine-tuned speech representations extracted from a large self-supervised architecture (WavLM)
We experiment with a code-switched corpus consisting of five South African languages (isiZulu, isiXa, Setswana, Sesotho and English)
arXiv Detail & Related papers (2023-12-15T09:40:41Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages [20.25236081418051]
Zambezi Voice is an open-source multilingual speech resource for Zambian languages.
To our knowledge, this is the first multilingual speech dataset created for Zambian languages.
arXiv Detail & Related papers (2023-06-07T13:36:37Z) - ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks.
We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes.
Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation.
The key idea is to generate source transcript and target translation text with a single decoder.
Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z) - CoVoST 2 and Massively Multilingual Speech-to-Text Translation [24.904548615918355]
CoVoST 2 is a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages.
This represents the largest open dataset available to date from total volume and language coverage perspective.
arXiv Detail & Related papers (2020-07-20T17:53:35Z) - CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English.
It diversified with over 11,000 speakers and over 60 accents.
CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.