A Swiss German Dictionary: Variation in Speech and Writing
- URL: http://arxiv.org/abs/2004.00139v1
- Date: Tue, 31 Mar 2020 22:10:43 GMT
- Title: A Swiss German Dictionary: Variation in Speech and Writing
- Authors: Larissa Schmidt (1), Lucy Linder (2), Sandra Djambazovska (3),
Alexandros Lazaridis (3), Tanja Samard\v{z}i\'c (1), and Claudiu Musat (3)
((1) University of Zurich: URPP Language and Space, (2) University of
Fribourg, (3) Swisscom AG: Data Analytics & AI (DNA))
- Abstract summary: We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German.
To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA)
This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions.
- Score: 45.82374977939355
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a dictionary containing forms of common words in various Swiss
German dialects normalized into High German. As Swiss German is, for now, a
predominantly spoken language, there is a significant variation in the written
forms, even between speakers of the same dialect. To alleviate the uncertainty
associated with this diversity, we complement the pairs of Swiss German - High
German words with the Swiss German phonetic transcriptions (SAMPA). This
dictionary becomes thus the first resource to combine large-scale spontaneous
translation with phonetic transcriptions. Moreover, we control for the regional
distribution and insure the equal representation of the major Swiss dialects.
The coupling of the phonetic and written Swiss German forms is powerful. We
show that they are sufficient to train a Transformer-based phoneme to grapheme
model that generates credible novel Swiss German writings. In addition, we show
that the inverse mapping - from graphemes to phonemes - can be modeled with a
transformer trained with the novel dictionary. This generation of
pronunciations for previously unknown words is key in training extensible
automated speech recognition (ASR) systems, which are key beneficiaries of this
dictionary.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Modular Adaptation of Multilingual Encoders to Written Swiss German
Dialect [52.1701152610258]
Adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance.
For the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies.
arXiv Detail & Related papers (2024-01-25T18:59:32Z) - Dialect Transfer for Swiss German Speech Translation [9.373232685350844]
This paper investigates the challenges in building Swiss German speech translation systems.
It focuses on the impact of dialect diversity and differences between Swiss German and Standard German.
arXiv Detail & Related papers (2023-10-13T13:16:57Z) - SwissBERT: The Multilingual Language Model for Switzerland [52.1701152610258]
SwissBERT is a masked language model created specifically for processing Switzerland-related text.
SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland.
Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work.
arXiv Detail & Related papers (2023-03-23T14:44:47Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Dialectal Speech Recognition and Translation of Swiss German Speech to
Standard German Text: Microsoft's Submission to SwissText 2021 [17.675379299410054]
Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland.
We propose a hybrid automatic speech recognition system with a lexicon that incorporates translations.
Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.
arXiv Detail & Related papers (2021-06-15T13:34:02Z) - Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in
German Speech Recognition [1.3381749415517017]
Anglicisms are a challenge in German speech recognition due to irregular pronunciation compared to native German words.
We propose a multitask sequence-to-sequence approach for grapheme-to-phoneme conversion to improve the phonetization of Anglicisms.
We show that multitask learning can help solving the challenge of loanwords in German speech recognition.
arXiv Detail & Related papers (2021-05-26T17:42:13Z) - SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German [22.30271453485001]
We introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference.
Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German.
arXiv Detail & Related papers (2021-03-21T14:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.