Dialect Transfer for Swiss German Speech Translation
- URL: http://arxiv.org/abs/2310.09088v1
- Date: Fri, 13 Oct 2023 13:16:57 GMT
- Title: Dialect Transfer for Swiss German Speech Translation
- Authors: Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela H\"urlimann,
Manfred Vogel, Mark Cieliebak
- Abstract summary: This paper investigates the challenges in building Swiss German speech translation systems.
It focuses on the impact of dialect diversity and differences between Swiss German and Standard German.
- Score: 9.373232685350844
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper investigates the challenges in building Swiss German speech
translation systems, specifically focusing on the impact of dialect diversity
and differences between Swiss German and Standard German. Swiss German is a
spoken language with no formal writing system, it comprises many diverse
dialects and is a low-resource language with only around 5 million speakers.
The study is guided by two key research questions: how does the inclusion and
exclusion of dialects during the training of speech translation models for
Swiss German impact the performance on specific dialects, and how do the
differences between Swiss German and Standard German impact the performance of
the systems? We show that dialect diversity and linguistic differences pose
significant challenges to Swiss German speech translation, which is in line
with linguistic hypotheses derived from empirical investigations.
Related papers
- What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Modular Adaptation of Multilingual Encoders to Written Swiss German
Dialect [52.1701152610258]
Adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance.
For the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies.
arXiv Detail & Related papers (2024-01-25T18:59:32Z) - STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions [5.6787416472329495]
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech annotated with Standard German text at the sentence level.
The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record.
It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date.
arXiv Detail & Related papers (2023-05-30T08:49:38Z) - SwissBERT: The Multilingual Language Model for Switzerland [52.1701152610258]
SwissBERT is a masked language model created specifically for processing Switzerland-related text.
SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland.
Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work.
arXiv Detail & Related papers (2023-03-23T14:44:47Z) - 2nd Swiss German Speech to Standard German Text Shared Task at SwissText
2022 [3.910747992453137]
The objective was to maximize the BLEU score on a test set of Grisons speech.
3 teams participated, with the best-performing system achieving a BLEU score of 70.1.
arXiv Detail & Related papers (2023-01-17T10:31:11Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Multi-VALUE: A Framework for Cross-Dialectal English NLP [49.55176102659081]
Multi- Dialect is a controllable rule-based translation system spanning 50 English dialects.
Stress tests reveal significant performance disparities for leading models on non-standard dialects.
We partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task.
arXiv Detail & Related papers (2022-12-15T18:17:01Z) - Dialectal Speech Recognition and Translation of Swiss German Speech to
Standard German Text: Microsoft's Submission to SwissText 2021 [17.675379299410054]
Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland.
We propose a hybrid automatic speech recognition system with a lexicon that incorporates translations.
Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.
arXiv Detail & Related papers (2021-06-15T13:34:02Z) - SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German [22.30271453485001]
We introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference.
Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German.
arXiv Detail & Related papers (2021-03-21T14:00:09Z) - A Swiss German Dictionary: Variation in Speech and Writing [45.82374977939355]
We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German.
To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA)
This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions.
arXiv Detail & Related papers (2020-03-31T22:10:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.