Learning to Recognize Code-switched Speech Without Forgetting
Monolingual Speech Recognition
- URL: http://arxiv.org/abs/2006.00782v1
- Date: Mon, 1 Jun 2020 08:16:24 GMT
- Title: Learning to Recognize Code-switched Speech Without Forgetting
Monolingual Speech Recognition
- Authors: Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas
Joshi
- Abstract summary: We show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.
We propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy.
- Score: 14.559210845981605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been significant progress made in Automatic Speech
Recognition (ASR) of code-switched speech, leading to gains in accuracy on
code-switched datasets in many language pairs. Code-switched speech co-occurs
with monolingual speech in one or both languages being mixed. In this work, we
show that fine-tuning ASR models on code-switched speech harms performance on
monolingual speech. We point out the need to optimize models for code-switching
while also ensuring that monolingual performance is not sacrificed. Monolingual
models may be trained on thousands of hours of speech which may not be
available for re-training a new model. We propose using the Learning Without
Forgetting (LWF) framework for code-switched ASR when we only have access to a
monolingual model and do not have the data it was trained on. We show that it
is possible to train models using this framework that perform well on both
code-switched and monolingual test sets. In cases where we have access to
monolingual training data as well, we propose regularization strategies for
fine-tuning models for code-switching without sacrificing monolingual accuracy.
We report improvements in Word Error Rate (WER) in monolingual and
code-switched test sets compared to baselines that use pooled data and simple
fine-tuning.
Related papers
- Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Adapting the adapters for code-switching in multilingual ASR [10.316724084739892]
Large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition to many low-resource languages.
Some of these models employ language adapters in their formulation, which helps to improve monolingual performance.
This formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance.
We propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network.
arXiv Detail & Related papers (2023-10-11T12:15:24Z) - Simple yet Effective Code-Switching Language Identification with
Multitask Pre-Training and Transfer Learning [0.7242530499990028]
Code-switching is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance.
We propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset.
Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.
arXiv Detail & Related papers (2023-05-31T11:43:16Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models [1.827510863075184]
Novel benchmarks for multilingual natural language understanding (NLU) include monolingual sentences in several languages, annotated with intents and slots.
Existing benchmarks lack of code-switched utterances, which are difficult to gather and label due to complexity in the grammatical structure.
Our work adopts recognized methods to generate plausible and naturally-sounding code-switched utterances and uses them to create a synthetic code-switched test set.
arXiv Detail & Related papers (2021-09-29T11:15:00Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.