Streaming Bilingual End-to-End ASR model using Attention over Multiple
Softmax
- URL: http://arxiv.org/abs/2401.11645v1
- Date: Mon, 22 Jan 2024 01:44:42 GMT
- Title: Streaming Bilingual End-to-End ASR model using Attention over Multiple
Softmax
- Authors: Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta
- Abstract summary: We propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages.
The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism.
- Score: 6.386371634323785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even with several advancements in multilingual modeling, it is challenging to
recognize multiple languages using a single neural model, without knowing the
input language and most multilingual models assume the availability of the
input language. In this work, we propose a novel bilingual end-to-end (E2E)
modeling approach, where a single neural model can recognize both languages and
also support switching between the languages, without any language input from
the user. The proposed model has shared encoder and prediction networks, with
language-specific joint networks that are combined via a self-attention
mechanism. As the language-specific posteriors are combined, it produces a
single posterior probability over all the output symbols, enabling a single
beam search decoding and also allowing dynamic switching between the languages.
The proposed approach outperforms the conventional bilingual baseline with
13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and
code-mixed test sets, respectively.
Related papers
- Code-mixed Sentiment and Hate-speech Prediction [2.9140539998069803]
Large language models have dominated most natural language processing tasks.
We created four new bilingual pre-trained masked language models for English-Hindi and English-Slovene languages.
We performed an evaluation of monolingual, bilingual, few-lingual, and massively multilingual models on several languages.
arXiv Detail & Related papers (2024-05-21T16:56:36Z) - Adapting the adapters for code-switching in multilingual ASR [10.316724084739892]
Large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition to many low-resource languages.
Some of these models employ language adapters in their formulation, which helps to improve monolingual performance.
This formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance.
We propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network.
arXiv Detail & Related papers (2023-10-11T12:15:24Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - A Configurable Multilingual Model is All You Need to Recognize All
Languages [52.274446882747455]
We propose a novel multilingual model (CMM) which is trained only once but can be configured as different models based on users' choices.
CMM improves from the universal multilingual model by 26.4%, 16.9%, and 10.4% relative word error reduction when the user selects 1, 2, or 3 languages.
arXiv Detail & Related papers (2021-07-13T06:52:41Z) - Efficient Weight factorization for Multilingual Speech Recognition [67.00151881207792]
End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages.
Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously.
We propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions.
arXiv Detail & Related papers (2021-05-07T00:12:02Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Improved acoustic word embeddings for zero-resource languages using
multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages.
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z) - Language-agnostic Multilingual Modeling [23.06484126933893]
We build a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer.
We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10% relative reduction in Word Error Rate (WER) over a language-dependent multilingual model.
arXiv Detail & Related papers (2020-04-20T18:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.