Pronunciation Generation for Foreign Language Words in Intra-Sentential
Code-Switching Speech Recognition
- URL: http://arxiv.org/abs/2210.14691v1
- Date: Wed, 26 Oct 2022 13:19:35 GMT
- Title: Pronunciation Generation for Foreign Language Words in Intra-Sentential
Code-Switching Speech Recognition
- Authors: Wei Wang, Chao Zhang and Xiaopei Wu
- Abstract summary: Code-Switching refers to the phenomenon of switching languages within a sentence or discourse.
In this paper, we make use of limited code-switching data as driving materials and explore a shortcut to quickly develop intra-sentential code-switching recognition skill.
- Score: 14.024346215923972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-Switching refers to the phenomenon of switching languages within a
sentence or discourse. However, limited code-switching , different language
phoneme-sets and high rebuilding costs throw a challenge to make the
specialized acoustic model for code-switching speech recognition. In this
paper, we make use of limited code-switching data as driving materials and
explore a shortcut to quickly develop intra-sentential code-switching
recognition skill on the commissioned native language acoustic model, where we
propose a data-driven method to make the seed lexicon which is used to train
grapheme-to-phoneme model to predict mapping pronunciations for foreign
language word in code-switching sentences. The core work of the data-driven
technology in this paper consists of a phonetic decoding method and different
selection methods. And for imbalanced word-level driving materials problem, we
have an internal assistance inspiration that learning the good pronunciation
rules in the words that possess sufficient materials using the
grapheme-to-phoneme model to help the scarce. Our experiments show that the
Mixed Error Rate in intra-sentential Chinese-English code-switching recognition
reduced from 29.15\%, acquired on the pure Chinese recognizer, to 12.13\% by
adding foreign language words' pronunciation through our data-driven approach,
and finally get the best result 11.14\% with the combination of different
selection methods and internal assistance tactic.
Related papers
- Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced
Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder.
We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Towards General-Purpose Text-Instruction-Guided Voice Conversion [84.78206348045428]
This paper introduces a novel voice conversion model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice"
The proposed VC model is a neural language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech.
arXiv Detail & Related papers (2023-09-25T17:52:09Z) - Simple yet Effective Code-Switching Language Identification with
Multitask Pre-Training and Transfer Learning [0.7242530499990028]
Code-switching is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance.
We propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset.
Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.
arXiv Detail & Related papers (2023-05-31T11:43:16Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Exploring Retraining-Free Speech Recognition for Intra-sentential
Code-Switching [17.973043287866986]
We present our initial efforts for building a code-switching (CS) speech recognition system.
We have designed an automatic approach to obtain high quality pronunciation of foreign language words.
Our best system achieves a 55.5% relative word error rate reduction from 34.4%, obtained with a conventional monolingual ASR system.
arXiv Detail & Related papers (2021-08-27T19:15:16Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.