Related papers: Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus

Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus

URL: http://arxiv.org/abs/2502.18215v1
Date: Tue, 25 Feb 2025 14:00:15 GMT
Title: Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus
Authors: Samy Ouzerrout,
Abstract summary: This paper introduces a methodology for constructing LoReSpeech, a low-resource speech-to-speech translation corpus.<n>LoReSpeech delivers both intra- and inter-language alignments, enabling advancements in multilingual ASR systems.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aligned audio corpora are fundamental to NLP technologies such as ASR and speech translation, yet they remain scarce for underrepresented languages, hindering their technological integration. This paper introduces a methodology for constructing LoReSpeech, a low-resource speech-to-speech translation corpus. Our approach begins with LoReASR, a sub-corpus of short audios aligned with their transcriptions, created through a collaborative platform. Building on LoReASR, long-form audio recordings, such as biblical texts, are aligned using tools like the MFA. LoReSpeech delivers both intra- and inter-language alignments, enabling advancements in multilingual ASR systems, direct speech-to-speech translation models, and linguistic preservation efforts, while fostering digital inclusivity. This work is conducted within Tutlayt AI project (https://tutlayt.fr).

Related papers

FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks [27.894172151026044]
FLEURS-R is a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech corpus. The aim of FLEURS-R is to advance speech technology in more languages and catalyze research including text-to-speech.
arXiv Detail & Related papers (2024-08-12T15:28:51Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation. By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech. We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z)
On decoder-only architecture for speech-to-text and large language model integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z)
AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models. It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z)
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus [3.1925030748447747]
We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.
arXiv Detail & Related papers (2023-02-28T20:18:59Z)
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes. Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z)
LibriS2S: A German-English Speech-to-Speech Translation Corpus [12.376309678270275]
We create the first publicly available speech-to-speech training corpus between German and English. This allows the creation of a new text-to-speech and speech-to-speech translation model. We propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model.
arXiv Detail & Related papers (2022-04-22T09:33:31Z)
The Multilingual TEDx Corpus for Speech Recognition and Translation [30.993199499048824]
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations.
arXiv Detail & Related papers (2021-02-02T21:16:25Z)
Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way. Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously. We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.