Optimizing Bilingual Neural Transducer with Synthetic Code-switching
Text Generation
- URL: http://arxiv.org/abs/2210.12214v1
- Date: Fri, 21 Oct 2022 19:42:41 GMT
- Title: Optimizing Bilingual Neural Transducer with Synthetic Code-switching
Text Generation
- Authors: Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva,
Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott,
Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza
Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati and Mohamed Ali
- Abstract summary: Semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech.
Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set.
- Score: 10.650573361117669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-switching describes the practice of using more than one language in the
same sentence. In this study, we investigate how to optimize a neural
transducer based bilingual automatic speech recognition (ASR) model for
code-switching speech. Focusing on the scenario where the ASR model is trained
without supervised code-switching data, we found that semi-supervised training
and synthetic code-switched data can improve the bilingual ASR system on
code-switching speech. We analyze how each of the neural transducer's encoders
contributes towards code-switching performance by measuring encoder-specific
recall values, and evaluate our English/Mandarin system on the ASCEND data set.
Our final system achieves 25% mixed error rate (MER) on the ASCEND
English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute
compared to the previous literature -- while maintaining good accuracy on the
monolingual test sets.
Related papers
- Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages [49.6922490267701]
We introduce a new zero resource code-switched speech benchmark designed to assess the code-switching capabilities of self-supervised speech encoders.
We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed.
arXiv Detail & Related papers (2023-10-04T17:58:11Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - Streaming End-to-End Multilingual Speech Recognition with Joint Language
Identification [14.197869575012925]
We propose to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor.
RNN-T with cascaded encoders can achieve streaming ASR with low latency using first-pass decoding with no right-context, and achieve lower word error rates (WERs) using second-pass decoding with longer right-context.
Experimental results on a voice search dataset with 9 language locales shows that the proposed method achieves an average of 96.2% LID prediction accuracy and the same second-pass WER
arXiv Detail & Related papers (2022-09-13T15:10:41Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - Arabic Code-Switching Speech Recognition using Monolingual Data [13.513655231184261]
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization.
Recent research in multilingual ASR shows potential improvement over monolingual systems.
We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments.
arXiv Detail & Related papers (2021-07-04T08:40:49Z) - Using heterogeneity in semi-supervised transcription hypotheses to
improve code-switched speech recognition [6.224255518500385]
We show that monolingual data may be more closely matched to one of the languages in the code-switch pair.
We propose a semi-supervised approach for code-switched ASR.
arXiv Detail & Related papers (2021-06-14T18:39:18Z) - Transformer-Transducers for Code-Switched Speech Recognition [23.281314397784346]
We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
arXiv Detail & Related papers (2020-11-30T17:27:41Z) - Data Augmentation for End-to-end Code-switching Speech Recognition [54.0507000473827]
Three novel approaches are proposed for code-switching data augmentation.
Audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion.
Experiments on 200 hours Mandarin-English code-switching dataset show significant improvements on code-switching ASR individually.
arXiv Detail & Related papers (2020-11-04T07:12:44Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.