STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
- URL: http://arxiv.org/abs/2305.18855v1
- Date: Tue, 30 May 2023 08:49:38 GMT
- Title: STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
- Authors: Michel Pl\"uss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia
Hartmann, Larissa Schmidt, Christian Scheller, Manuela H\"urlimann, Tanja
Samard\v{z}i\'c, Manfred Vogel, Mark Cieliebak
- Abstract summary: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech annotated with Standard German text at the sentence level.
The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record.
It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date.
- Score: 5.6787416472329495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss
German speech, annotated with Standard German text at the sentence level. The
data is collected using a web app in which the speakers are shown Standard
German sentences, which they translate to Swiss German and record. We make the
corpus publicly available. It contains 343 hours of speech from all dialect
regions and is the largest public speech corpus for Swiss German to date.
Application areas include automatic speech recognition (ASR), text-to-speech,
dialect identification, and speaker recognition. Dialect information, age
group, and gender of the 316 speakers are provided. Genders are equally
represented and the corpus includes speakers of all ages. Roughly the same
amount of speech is provided per dialect region, which makes the corpus ideally
suited for experiments with speech technology for different dialects. We
provide training, validation, and test splits of the data. The test set
consists of the same spoken sentences for each dialect region and allows a fair
evaluation of the quality of speech technologies in different dialects. We
train an ASR model on the training set and achieve an average BLEU score of
74.7 on the test set. The model beats the best published BLEU scores on 2 other
Swiss German ASR test sets, demonstrating the quality of the corpus.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data.
The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech.
We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z) - SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks.
We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes.
Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z) - SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation.
Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities.
We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z) - SDS-200: A Swiss German Speech to Standard German Text Corpus [5.370317759946287]
We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations.
The data was collected using a web recording tool that is open to the public.
The data consists of 200 hours of speech by around 4000 different speakers and covers a large part of the Swiss-German dialect landscape.
arXiv Detail & Related papers (2022-05-19T12:16:29Z) - Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some
benchmarks [9.160401226886947]
The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech.
The primary goals of the collection were to create a representative, large-scale resource to study spontaneous spoken Finnish and to accelerate the development of language technology and speech-based services.
We present the collection process and the collected corpus, and showcase its versatility through multiple use cases.
arXiv Detail & Related papers (2022-03-24T07:50:25Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z) - Dialectal Speech Recognition and Translation of Swiss German Speech to
Standard German Text: Microsoft's Submission to SwissText 2021 [17.675379299410054]
Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland.
We propose a hybrid automatic speech recognition system with a lexicon that incorporates translations.
Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.
arXiv Detail & Related papers (2021-06-15T13:34:02Z) - English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech
Recognition System [3.4888132404740797]
We evaluate a state-of-the-art automatic speech recognition model, using unseen data from a corpus with a wide variety of labeled English accents.
We show that there is indeed an accuracy bias in terms of accentual variety, favoring the accents most prevalent in the training corpus.
arXiv Detail & Related papers (2021-05-09T08:24:33Z) - Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech
to Standard German Text Corpus [2.610806620660055]
This first version of the corpus is based on publicly available data of the Bernese cantonal parliament and consists of 293 hours of data.
It was created using a novel forced sentence alignment procedure and an alignment quality estimator.
We trained Automatic Speech Recognition (ASR) models as baselines on different subsets of the data and achieved a Word Error Rate (WER) of 0.278 and a BLEU score of 0.586 on the SPC test set.
arXiv Detail & Related papers (2020-10-06T15:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.