Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching
- URL: http://arxiv.org/abs/2112.10202v1
- Date: Sun, 19 Dec 2021 17:31:15 GMT
- Title: Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching
- Authors: Chia-Yu Li and Ngoc Thang Vu
- Abstract summary: Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
- Score: 41.88097793717185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-Switching (CS) is a common linguistic phenomenon in multilingual
communities that consists of switching between languages while speaking. This
paper presents our investigations on end-to-end speech recognition for
Mandarin-English CS speech. We analyse different CS specific issues such as the
properties mismatches between languages in a CS language pair, the
unpredictable nature of switching points, and the data scarcity problem. We
exploit and improve the state-of-the-art end-to-end system by merging
nonlinguistic symbols, by integrating language identification using
hierarchical softmax, by modeling sub-word units, by artificially lowering the
speaking rate, and by augmenting data using speed perturbed technique and
several monolingual datasets to improve the final performance not only on CS
speech but also on monolingual benchmarks in order to make the system more
applicable on real life settings. Finally, we explore the effect of different
language model integration methods on the performance of the proposed model.
Our experimental results reveal that all the proposed techniques improve the
recognition performance. The best combined system improves the baseline system
by up to 35% relatively in terms of mixed error rate and delivers acceptable
performance on monolingual benchmarks.
Related papers
- Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - Large Language Models for Dysfluency Detection in Stuttered Speech [16.812800649507302]
Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components.
Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, we approach the task of multi-label dysfluency detection as a language modeling problem.
We present hypotheses candidates generated with an automatic speech recognition system and acoustic representations extracted from an audio encoder model to an LLM, and finetune the system to predict dysfluency labels on three datasets containing English and German stuttered speech.
arXiv Detail & Related papers (2024-06-16T17:51:22Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced
Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder.
We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.