Related papers: Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

URL: http://arxiv.org/abs/2510.07890v1
Date: Thu, 09 Oct 2025 07:43:08 GMT
Title: Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects
Authors: Verena Blaschke, Miriam Winkler, Barbara Plank,
Abstract summary: We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems.<n>In our experiments, we focus on German and multiple German dialects in the context of written and spoken intent and topic classification.<n>We find that the speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data.
Score: 36.91800117379075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings are known to cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. In our experiments, we focus on German and multiple German dialects in the context of written and spoken intent and topic classification. To that end, we release the first dialectal audio intent classification dataset. We find that the speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.

Related papers

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script [3.5149312379702127]
Homophone normalization is a pre-processing step applied in Amharic Natural Language Processing literature.<n>We propose a post-inference intervention in which normalization is applied to model predictions instead of training data.<n>Our work contributes to the broader discussion on technology-facilitated language change and calls for more language-aware interventions.
arXiv Detail & Related papers (2025-07-20T22:35:08Z)
A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation [19.535404632372042]
Betthupferl is an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany.<n>We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them.<n>We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions.
arXiv Detail & Related papers (2025-06-03T14:02:52Z)
Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
A Benchmark for Evaluating Machine Translation Metrics on Dialects Without Standard Orthography [40.04973667048665]
We evaluate how robust metrics are to non-standardized dialects. We collect a dataset of human translations and human judgments for automatic machine translations from English to two Swiss German dialects.
arXiv Detail & Related papers (2023-11-28T15:12:11Z)
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities. We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z)
Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language. We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German [22.30271453485001]
We introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference. Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German.
arXiv Detail & Related papers (2021-03-21T14:00:09Z)
Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.