Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual
Loss
- URL: http://arxiv.org/abs/2308.06327v1
- Date: Fri, 11 Aug 2023 18:06:33 GMT
- Title: Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual
Loss
- Authors: Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad,
Kshitiz Kumar, Jian Wu
- Abstract summary: We introduce a bilingual solution to support English as secondary locale for most primary locales in automatic speech recognition (ASR)
Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model.
We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications.
- Score: 11.447307867370064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a bilingual solution to support English as secondary locale for
most primary locales in hybrid automatic speech recognition (ASR) settings. Our
key developments constitute: (a) pronunciation lexicon with grapheme units
instead of phone units, (b) a fully bilingual alignment model and subsequently
bilingual streaming transformer model, (c) a parallel encoder structure with
language identification (LID) loss, (d) parallel encoder with an auxiliary loss
for monolingual projections. We conclude that in comparison to LID loss, our
proposed auxiliary loss is superior in specializing the parallel encoders to
respective monolingual locales, and that contributes to stronger bilingual
learning. We evaluate our work on large-scale training and test tasks for
bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual
models demonstrate strong English code-mixing capability. In particular, the
bilingual IT model improves the word error rate (WER) for a code-mix IT task
from 46.5% to 13.8%, while also achieving a close parity (9.6%) with the
monolingual IT model (9.5%) over IT tests.
Related papers
- Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Building High-accuracy Multilingual ASR with Gated Language Experts and
Curriculum Training [45.48362355283723]
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models.
Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information.
arXiv Detail & Related papers (2023-03-01T19:20:01Z) - Scaling Up Deliberation for Multilingual ASR [36.860327600638705]
We investigate second-pass deliberation for multilingual speech recognition.
Our proposed deliberation is multilingual, i.e., the text encoder encodes hypothesis text from multiple languages, and the decoder attends to multilingual text and audio.
We show that deliberation improves the average WER on 9 languages by 4% relative compared to the single-pass model.
arXiv Detail & Related papers (2022-10-11T21:07:00Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Bilingual End-to-End ASR with Byte-Level Subwords [4.268218327369146]
We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE)
We focus on developing a single end-to-end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances.
We find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2% to 5% relative even with smaller number of outputs and fewer parameters.
arXiv Detail & Related papers (2022-05-01T15:01:01Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0 [7.378368959253632]
We show that a monolingual wav2vec-2.0 is a good few-shot ASR learner in several languages.
A key finding of this work is that the adapted monolingual wav2vec-2.0 achieves similar performance as the topline multilingual XLSR model.
arXiv Detail & Related papers (2021-10-07T15:29:22Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.