SDS-200: A Swiss German Speech to Standard German Text Corpus
- URL: http://arxiv.org/abs/2205.09501v1
- Date: Thu, 19 May 2022 12:16:29 GMT
- Title: SDS-200: A Swiss German Speech to Standard German Text Corpus
- Authors: Michel Pl\"uss, Manuela H\"urlimann, Marc Cuny, Alla St\"ockli,
Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller,
Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel
- Abstract summary: We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations.
The data was collected using a web recording tool that is open to the public.
The data consists of 200 hours of speech by around 4000 different speakers and covers a large part of the Swiss-German dialect landscape.
- Score: 5.370317759946287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present SDS-200, a corpus of Swiss German dialectal speech with Standard
German text translations, annotated with dialect, age, and gender information
of the speakers. The dataset allows for training speech translation, dialect
recognition, and speech synthesis systems, among others. The data was collected
using a web recording tool that is open to the public. Each participant was
given a text in Standard German and asked to translate it to their Swiss German
dialect before recording it. To increase the corpus quality, recordings were
validated by other participants. The data consists of 200 hours of speech by
around 4000 different speakers and covers a large part of the Swiss-German
dialect landscape. We release SDS-200 alongside a baseline speech translation
model, which achieves a word error rate (WER) of 30.3 and a BLEU score of 53.1
on the SDS-200 test set. Furthermore, we use SDS-200 to fine-tune a pre-trained
XLS-R model, achieving 21.6 WER and 64.0 BLEU.
Related papers
- Towards Robust Speech Representation Learning for Thousands of Languages [77.2890285555615]
Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data.
We propose XEUS, a Cross-lingual for Universal Speech, trained on over 1 million hours of data across 4057 languages.
arXiv Detail & Related papers (2024-06-30T21:40:26Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions [5.6787416472329495]
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech annotated with Standard German text at the sentence level.
The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record.
It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date.
arXiv Detail & Related papers (2023-05-30T08:49:38Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - The Norwegian Parliamentary Speech Corpus [0.5874142059884521]
The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament.
It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems.
Training on the NPSC is shown to have a "democratizing" effect in terms of dialects, as improvements are generally larger for dialects with higher WER from the baseline system.
arXiv Detail & Related papers (2022-01-26T11:41:55Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z) - Dialectal Speech Recognition and Translation of Swiss German Speech to
Standard German Text: Microsoft's Submission to SwissText 2021 [17.675379299410054]
Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland.
We propose a hybrid automatic speech recognition system with a lexicon that incorporates translations.
Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.
arXiv Detail & Related papers (2021-06-15T13:34:02Z) - SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German [22.30271453485001]
We introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference.
Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German.
arXiv Detail & Related papers (2021-03-21T14:00:09Z) - Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech
to Standard German Text Corpus [2.610806620660055]
This first version of the corpus is based on publicly available data of the Bernese cantonal parliament and consists of 293 hours of data.
It was created using a novel forced sentence alignment procedure and an alignment quality estimator.
We trained Automatic Speech Recognition (ASR) models as baselines on different subsets of the data and achieved a Word Error Rate (WER) of 0.278 and a BLEU score of 0.586 on the SPC test set.
arXiv Detail & Related papers (2020-10-06T15:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.