Related papers: A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

URL: http://arxiv.org/abs/2506.02894v1
Date: Tue, 03 Jun 2025 14:02:52 GMT
Title: A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation
Authors: Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser, Barbara Plank,
Abstract summary: Betthupferl is an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany.<n>We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them.<n>We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions.
Score: 19.535404632372042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany (Franconian, Bavarian, Alemannic), and half an hour of Standard German speech. We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them. We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions. Qualitative error analyses of the best ASR model reveal that it sometimes normalizes grammatical differences, but often stays closer to the dialectal constructions.

Related papers

Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects [36.91800117379075]
We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems.<n>In our experiments, we focus on German and multiple German dialects in the context of written and spoken intent and topic classification.<n>We find that the speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data.
arXiv Detail & Related papers (2025-10-09T07:43:08Z)
Large Language Models Discriminate Against Speakers of German Dialects [44.05620251584259]
In Germany, more than 40% of the population speaks a regional dialect.<n>We examine whether such stereotypes are mirrored by large language models (LLMs)<n>We find that explicitly labeling linguistic demographics--German dialect speakers--amplifies bias more than implicit cues like dialect usage.
arXiv Detail & Related papers (2025-09-17T09:05:37Z)
SpeechR: A Benchmark for Speech Reasoning in Large Audio-Language Models [60.72029578488467]
SpeechR is a unified benchmark for evaluating reasoning over speech in large audio-language models.<n>It evaluates models along three key dimensions: factual retrieval, procedural inference, and normative judgment.<n> Evaluations on eleven state-of-the-art LALMs reveal that high transcription accuracy does not translate into strong reasoning capabilities.
arXiv Detail & Related papers (2025-08-04T03:28:04Z)
Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German. We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z)
Dialect Transfer for Swiss German Speech Translation [9.373232685350844]
This paper investigates the challenges in building Swiss German speech translation systems. It focuses on the impact of dialect diversity and differences between Swiss German and Standard German.
arXiv Detail & Related papers (2023-10-13T13:16:57Z)
STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions [5.6787416472329495]
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date.
arXiv Detail & Related papers (2023-05-30T08:49:38Z)
Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data. We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z)
North S\'{a}mi Dialect Identification with Self-supervised Speech Models [1.1548853370822343]
The North S'ami (NS) language encapsulates four primary dialectal variants that are related but have differences in their phonology, morphology, and vocabulary. We investigate an extensive set of acoustic features, including MFCCs and prosodic features, for the automatic detection of the four NS variants. Our results show that NS dialects are influenced by the state language and that the four dialects are separable, reaching high classification accuracy.
arXiv Detail & Related papers (2023-05-19T17:53:12Z)
Multi-VALUE: A Framework for Cross-Dialectal English NLP [49.55176102659081]
Multi- Dialect is a controllable rule-based translation system spanning 50 English dialects. Stress tests reveal significant performance disparities for leading models on non-standard dialects. We partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task.
arXiv Detail & Related papers (2022-12-15T18:17:01Z)
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z)
A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition [80.87085897419982]
We propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
arXiv Detail & Related papers (2022-05-06T06:07:09Z)
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z)
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German [22.30271453485001]
We introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference. Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German.
arXiv Detail & Related papers (2021-03-21T14:00:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.