Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in
German Speech Recognition
- URL: http://arxiv.org/abs/2105.12708v1
- Date: Wed, 26 May 2021 17:42:13 GMT
- Title: Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in
German Speech Recognition
- Authors: Julia Pritzen, Michael Gref, Christoph Schmidt, Dietlind Z\"uhlke
- Abstract summary: Anglicisms are a challenge in German speech recognition due to irregular pronunciation compared to native German words.
We propose a multitask sequence-to-sequence approach for grapheme-to-phoneme conversion to improve the phonetization of Anglicisms.
We show that multitask learning can help solving the challenge of loanwords in German speech recognition.
- Score: 1.3381749415517017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Loanwords, such as Anglicisms, are a challenge in German speech recognition.
Due to their irregular pronunciation compared to native German words,
automatically generated pronunciation dictionaries often include faulty phoneme
sequences for Anglicisms. In this work, we propose a multitask
sequence-to-sequence approach for grapheme-to-phoneme conversion to improve the
phonetization of Anglicisms. We extended a grapheme-to-phoneme model with a
classifier to distinguish Anglicisms from native German words. With this
approach, the model learns to generate pronunciations differently depending on
the classification result. We used our model to create supplementary Anglicism
pronunciation dictionaries that are added to an existing German speech
recognition model. Tested on a dedicated Anglicism evaluation set, we improved
the recognition of Anglicisms compared to a baseline model, reducing the word
error rate by 1 % and the Anglicism error rate by 3 %. We show that multitask
learning can help solving the challenge of loanwords in German speech
recognition.
Related papers
- Improving grapheme-to-phoneme conversion by learning pronunciations from
speech recordings [12.669655363646257]
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation.
We propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings.
arXiv Detail & Related papers (2023-07-31T13:25:38Z) - SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation [10.016862617549991]
This paper proposes SoundChoice, a novel Grapheme-to-Phoneme (G2P) architecture that processes entire sentences rather than operating at the word level.
SoundChoice achieves a Phoneme Error Rate (PER) of 2.65% on whole-sentence transcription using data from LibriSpeech and Wikipedia.
arXiv Detail & Related papers (2022-07-27T01:14:59Z) - Computer-assisted Pronunciation Training -- Speech synthesis is almost
all you need [18.446969150062586]
Existing CAPT methods are not able to detect pronunciation errors with high accuracy.
We present three innovative techniques based on phoneme-to-phoneme (P2P), text-to-speech (T2S), and speech-to-speech (S2S) conversion.
We show that these techniques not only improve the accuracy of three machine learning models for detecting pronunciation errors but also help establish a new state-of-the-art in the field.
arXiv Detail & Related papers (2022-07-02T08:33:33Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects [1.3786433185027864]
Grapheme-to-Phoneme (G2P) models convert words to their phonetic pronunciations.
Usually, dictionary-based methods require significant manual effort to build, and have limited adaptivity on unseen words.
We propose a novel use of transformer-based attention model that can adapt to unseen dialects of English language, while using a small dictionary.
arXiv Detail & Related papers (2021-04-08T21:36:21Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Seeing wake words: Audio-visual Keyword Spotting [103.12655603634337]
KWS-Net is a novel convolutional architecture that uses a similarity map intermediate representation to separate the task into sequence matching and pattern detection.
We show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data.
arXiv Detail & Related papers (2020-09-02T17:57:38Z) - A Swiss German Dictionary: Variation in Speech and Writing [45.82374977939355]
We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German.
To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA)
This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions.
arXiv Detail & Related papers (2020-03-31T22:10:43Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.