Using heterogeneity in semi-supervised transcription hypotheses to
improve code-switched speech recognition
- URL: http://arxiv.org/abs/2106.07699v1
- Date: Mon, 14 Jun 2021 18:39:18 GMT
- Title: Using heterogeneity in semi-supervised transcription hypotheses to
improve code-switched speech recognition
- Authors: Andrew Slottje, Shannon Wotherspoon, William Hartmann, Matthew Snover,
Owen Kimball
- Abstract summary: We show that monolingual data may be more closely matched to one of the languages in the code-switch pair.
We propose a semi-supervised approach for code-switched ASR.
- Score: 6.224255518500385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling code-switched speech is an important problem in automatic speech
recognition (ASR). Labeled code-switched data are rare, so monolingual data are
often used to model code-switched speech. These monolingual data may be more
closely matched to one of the languages in the code-switch pair. We show that
such asymmetry can bias prediction toward the better-matched language and
degrade overall model performance. To address this issue, we propose a
semi-supervised approach for code-switched ASR. We consider the case of
English-Mandarin code-switching, and the problem of using monolingual data to
build bilingual "transcription models'' for annotation of unlabeled
code-switched data. We first build multiple transcription models so that their
individual predictions are variously biased toward either English or Mandarin.
We then combine these biased transcriptions using confidence-based selection.
This strategy generates a superior transcript for semi-supervised training, and
obtains a 19% relative improvement compared to a semi-supervised system that
relies on a transcription model built with only the best-matched monolingual
data.
Related papers
- Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - The Effect of Alignment Objectives on Code-Switching Translation [0.0]
We are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another.
This model can be considered a bilingual model in the human sense.
arXiv Detail & Related papers (2023-09-10T14:46:31Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Optimizing Bilingual Neural Transducer with Synthetic Code-switching
Text Generation [10.650573361117669]
Semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech.
Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set.
arXiv Detail & Related papers (2022-10-21T19:42:41Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech [3.42658286826597]
We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation.
Our model is shown to effectively share information across languages and according to a subjective evaluation test, it produces more natural and accurate code-switching speech than the baselines.
arXiv Detail & Related papers (2020-08-03T10:43:30Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.