Transformer-Transducers for Code-Switched Speech Recognition
- URL: http://arxiv.org/abs/2011.15023v2
- Date: Mon, 15 Feb 2021 02:46:51 GMT
- Title: Transformer-Transducers for Code-Switched Speech Recognition
- Authors: Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff
- Abstract summary: We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
- Score: 23.281314397784346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We live in a world where 60% of the population can speak two or more
languages fluently. Members of these communities constantly switch between
languages when having a conversation. As automatic speech recognition (ASR)
systems are being deployed to the real-world, there is a need for practical
systems that can handle multiple languages both within an utterance or across
utterances. In this paper, we present an end-to-end ASR system using a
transformer-transducer model architecture for code-switched speech recognition.
We propose three modifications over the vanilla model in order to handle
various aspects of code-switching. First, we introduce two auxiliary loss
functions to handle the low-resource scenario of code-switching. Second, we
propose a novel mask-based training strategy with language ID information to
improve the label encoder training towards intra-sentential code-switching.
Finally, we propose a multi-label/multi-audio encoder structure to leverage the
vast monolingual speech corpora towards code-switching. We demonstrate the
efficacy of our proposed approaches on the SEAME dataset, a public
Mandarin-English code-switching corpus, achieving a mixed error rate of 18.5%
and 26.3% on test_man and test_sge sets respectively.
Related papers
- Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Adapting the adapters for code-switching in multilingual ASR [10.316724084739892]
Large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition to many low-resource languages.
Some of these models employ language adapters in their formulation, which helps to improve monolingual performance.
This formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance.
We propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network.
arXiv Detail & Related papers (2023-10-11T12:15:24Z) - Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages [49.6922490267701]
We introduce a new zero resource code-switched speech benchmark designed to assess the code-switching capabilities of self-supervised speech encoders.
We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed.
arXiv Detail & Related papers (2023-10-04T17:58:11Z) - Simple yet Effective Code-Switching Language Identification with
Multitask Pre-Training and Transfer Learning [0.7242530499990028]
Code-switching is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance.
We propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset.
Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.
arXiv Detail & Related papers (2023-05-31T11:43:16Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.