Towards Zero-Shot Code-Switched Speech Recognition
- URL: http://arxiv.org/abs/2211.01458v1
- Date: Wed, 2 Nov 2022 19:52:54 GMT
- Title: Towards Zero-Shot Code-Switched Speech Recognition
- Authors: Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji
Watanabe
- Abstract summary: We seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting.
We propose to simplify each monolingual module by allowing them to transcribe all speech segments indiscriminately with a monolingual script.
We apply this transliteration-based approach in an end-to-end differentiable neural network and demonstrate its efficacy for zero-shot CS ASR on Mandarin-English SEAME test sets.
- Score: 44.76492452463019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we seek to build effective code-switched (CS) automatic speech
recognition systems (ASR) under the zero-shot setting where no transcribed CS
speech data is available for training. Previously proposed frameworks which
conditionally factorize the bilingual task into its constituent monolingual
parts are a promising starting point for leveraging monolingual data
efficiently. However, these methods require the monolingual modules to perform
language segmentation. That is, each monolingual module has to simultaneously
detect CS points and transcribe speech segments of one language while ignoring
those of other languages -- not a trivial task. We propose to simplify each
monolingual module by allowing them to transcribe all speech segments
indiscriminately with a monolingual script (i.e. transliteration). This simple
modification passes the responsibility of CS point detection to subsequent
bilingual modules which determine the final output by considering multiple
monolingual transliterations along with external language model information. We
apply this transliteration-based approach in an end-to-end differentiable
neural network and demonstrate its efficacy for zero-shot CS ASR on
Mandarin-English SEAME test sets.
Related papers
- Unified model for code-switching speech recognition and language
identification based on a concatenated tokenizer [17.700515986659063]
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.
This paper proposes a new method for creating code-switching ASR datasets from purely monolingual data sources.
A novel Concatenated Tokenizer enables ASR models to generate language ID for each emitted text token while reusing existing monolingual tokenizers.
arXiv Detail & Related papers (2023-06-14T21:24:11Z) - Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation [71.35243644890537]
End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions.
Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space.
We propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text.
arXiv Detail & Related papers (2022-10-18T03:06:47Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Joint Modeling of Code-Switched and Monolingual ASR via Conditional
Factorization [75.98664099579392]
We propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition.
We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.
arXiv Detail & Related papers (2021-11-29T23:14:54Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - Learning to Recognize Code-switched Speech Without Forgetting
Monolingual Speech Recognition [14.559210845981605]
We show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.
We propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy.
arXiv Detail & Related papers (2020-06-01T08:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.