Data Augmentation for End-to-end Code-switching Speech Recognition
- URL: http://arxiv.org/abs/2011.02160v2
- Date: Sun, 03 Nov 2024 04:25:14 GMT
- Title: Data Augmentation for End-to-end Code-switching Speech Recognition
- Authors: Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian,
- Abstract summary: Three novel approaches are proposed for code-switching data augmentation.
Audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion.
Experiments on 200 hours Mandarin-English code-switching dataset show significant improvements on code-switching ASR individually.
- Score: 54.0507000473827
- License:
- Abstract: Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment
Related papers
- Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - Improving Code-Switching and Named Entity Recognition in ASR with Speech
Editing based Data Augmentation [22.38340990398735]
We propose a novel data augmentation method by applying the text-based speech editing model.
The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.
arXiv Detail & Related papers (2023-06-14T15:50:13Z) - Optimizing Bilingual Neural Transducer with Synthetic Code-switching
Text Generation [10.650573361117669]
Semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech.
Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set.
arXiv Detail & Related papers (2022-10-21T19:42:41Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Advanced Long-context End-to-end Speech Recognition Using
Context-expanded Transformers [56.56220390953412]
We extend our prior work by introducing the Conformer architecture to further improve the accuracy.
We demonstrate that the extended Transformer provides state-of-the-art end-to-end ASR performance.
arXiv Detail & Related papers (2021-04-19T16:18:00Z) - Transformer-Transducers for Code-Switched Speech Recognition [23.281314397784346]
We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
arXiv Detail & Related papers (2020-11-30T17:27:41Z) - Decoupling Pronunciation and Language for End-to-end Code-switching
Automatic Speech Recognition [66.47000813920617]
We propose a decoupled transformer model to use monolingual paired data and unpaired text data.
The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network.
By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model.
arXiv Detail & Related papers (2020-10-28T07:46:15Z) - Improving Low Resource Code-switched ASR using Augmented Code-switched
TTS [29.30430160611224]
Building Automatic Speech Recognition systems for code-switched speech has recently gained renewed attention.
End-to-end systems require large amounts of labeled speech.
We report significant improvements in ASR performance achieving absolute word error rate (WER) reductions of up to 5%.
arXiv Detail & Related papers (2020-10-12T09:15:12Z) - You Do Not Need More Data: Improving End-To-End Speech Recognition by
Text-To-Speech Data Augmentation [59.31769998728787]
We build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model.
Our system establishes a competitive result for end-to-end ASR trained on LibriSpeech train-clean-100 set with WER 4.3% for test-clean and 13.5% for test-other.
arXiv Detail & Related papers (2020-05-14T17:24:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.