Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition
- URL: http://arxiv.org/abs/2006.05257v1
- Date: Tue, 9 Jun 2020 13:45:30 GMT
- Title: Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition
- Authors: Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi,
Sunayana Sitaram
- Abstract summary: We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
- Score: 12.354292498112347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing code-switched speech is challenging for Automatic Speech
Recognition (ASR) for a variety of reasons, including the lack of code-switched
training data. Recently, we showed that monolingual ASR systems fine-tuned on
code-switched data deteriorate in performance on monolingual speech
recognition, which is not desirable as ASR systems deployed in multilingual
scenarios should recognize both monolingual and code-switched speech with high
accuracy. Our experiments indicated that this loss in performance could be
mitigated by using certain strategies for fine-tuning and regularization,
leading to improvements in both monolingual and code-switched ASR. In this
work, we present further improvements over our previous work by using domain
adversarial learning to train task agnostic models. We evaluate the
classification accuracy of an adversarial discriminator and show that it can
learn shared layer parameters that are task agnostic. We train end-to-end ASR
systems starting with a pooled model that uses monolingual and code-switched
data along with the adversarial discriminator. Our proposed technique leads to
reductions in Word Error Rates (WER) in monolingual and code-switched test sets
across three language pairs.
Related papers
- Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced
Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder.
We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - Label Aware Speech Representation Learning For Language Identification [49.197215416945596]
We propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.
This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function.
arXiv Detail & Related papers (2023-06-07T12:14:16Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Learning to Recognize Code-switched Speech Without Forgetting
Monolingual Speech Recognition [14.559210845981605]
We show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.
We propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy.
arXiv Detail & Related papers (2020-06-01T08:16:24Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.