Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models
- URL: http://arxiv.org/abs/2110.03504v1
- Date: Thu, 7 Oct 2021 14:43:35 GMT
- Title: Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models
- Authors: Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-yi Lee
- Abstract summary: Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
- Score: 55.82292352607321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-switching (CS) is common in daily conversations where more than one
language is used within a sentence. The difficulties of CS speech recognition
lie in alternating languages and the lack of transcribed data. Therefore, this
paper uses the recently successful self-supervised learning (SSL) methods to
leverage many unlabeled speech data without CS. We show that hidden
representations of SSL models offer frame-level language identity even if the
models are trained with English speech only. Jointly training CTC and language
identification modules with self-supervised speech representations improves CS
speech recognition performance. Furthermore, using multilingual speech data for
pre-training obtains the best CS speech recognition.
Related papers
- Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced
Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder.
We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? [45.901645659694935]
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks.
In this paper, we aim to clarify if speech SSL techniques can well capture linguistic knowledge.
arXiv Detail & Related papers (2023-06-14T09:04:29Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Continual-wav2vec2: an Application of Continual Learning for
Self-Supervised Automatic Speech Recognition [0.23872611575805824]
We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL)
Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data.
We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task.
arXiv Detail & Related papers (2021-07-26T10:39:03Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.