FT Speech: Danish Parliament Speech Corpus
- URL: http://arxiv.org/abs/2005.12368v2
- Date: Wed, 28 Oct 2020 13:36:44 GMT
- Title: FT Speech: Danish Parliament Speech Corpus
- Authors: Andreas Kirkedal, Marija Stepanovi\'c, Barbara Plank
- Abstract summary: This paper introduces FT Speech, a new speech corpus created from the recorded meetings of the Danish Parliament.
The corpus contains over 1,800 hours of transcribed speech by a total of 434 speakers.
It is significantly larger in duration, vocabulary, and amount of spontaneous speech than the existing public speech corpora for Danish.
- Score: 21.190182627955817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces FT Speech, a new speech corpus created from the
recorded meetings of the Danish Parliament, otherwise known as the Folketing
(FT). The corpus contains over 1,800 hours of transcribed speech by a total of
434 speakers. It is significantly larger in duration, vocabulary, and amount of
spontaneous speech than the existing public speech corpora for Danish, which
are largely limited to read-aloud and dictation data. We outline design
considerations, including the preprocessing methods and the alignment
procedure. To evaluate the quality of the corpus, we train automatic speech
recognition systems on the new resource and compare them to the systems trained
on the Danish part of Spr\r{a}kbanken, the largest public ASR corpus for Danish
to date. Our baseline results show that we achieve a 14.01 WER on the new
corpus. A combination of FT Speech with in-domain language data provides
comparable results to models trained specifically on Spr\r{a}kbanken, showing
that FT Speech transfers well to this data set. Interestingly, our results
demonstrate that the opposite is not the case. This shows that FT Speech
provides a valuable resource for promoting research on Danish ASR with more
spontaneous speech.
Related papers
- The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language [4.077418516695122]
Faetar has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark.
The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions.
We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%.
arXiv Detail & Related papers (2024-09-12T14:55:33Z) - FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks [27.894172151026044]
FLEURS-R is a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech corpus.
The aim of FLEURS-R is to advance speech technology in more languages and catalyze research including text-to-speech.
arXiv Detail & Related papers (2024-08-12T15:28:51Z) - TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - BASPRO: a balanced script producer for speech corpus collection based on
the genetic algorithm [29.701197643765674]
The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation.
We propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences.
arXiv Detail & Related papers (2022-12-11T02:05:30Z) - SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
Speech-to-Speech Translations [38.058120432870126]
SpeechMatrix is a large-scale multilingual corpus of speech-to-speech translations.
It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech.
arXiv Detail & Related papers (2022-11-08T19:09:27Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - The Norwegian Parliamentary Speech Corpus [0.5874142059884521]
The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament.
It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems.
Training on the NPSC is shown to have a "democratizing" effect in terms of dialects, as improvements are generally larger for dialects with higher WER from the baseline system.
arXiv Detail & Related papers (2022-01-26T11:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.