A language score based output selection method for multilingual speech
recognition
- URL: http://arxiv.org/abs/2005.00851v1
- Date: Sat, 2 May 2020 15:07:14 GMT
- Title: A language score based output selection method for multilingual speech
recognition
- Authors: Van Huy Nguyen, Thi Quynh Khanh Dinh, Truong Thinh Nguyen, Dang Khoa
Mac
- Abstract summary: A language model rescoring method is applied to produce all possible candidates for target languages.
A simple score is proposed to automatically select the output without any identifier model or language specification of the input language.
In addition, we present to design an English and Vietnamese End-to-End model to deal with not only the problem of cross-lingual speakers but also as a solution to improve the accuracy of borrowed words of English in Vietnamese.
- Score: 2.294014185517203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quality of a multilingual speech recognition system can be improved by
adaptation methods if the input language is specified. For systems that can
accept multilingual inputs, the popular approach is to apply a language
identifier to the input then switch or configure decoders in the next step, or
use one more subsequence model to select the output from a set of candidates.
Motivated by the goal of reducing the latency for real-time applications, in
this paper, a language model rescoring method is firstly applied to produce all
possible candidates for target languages, then a simple score is proposed to
automatically select the output without any identifier model or language
specification of the input language. The main point is that this score can be
simply and automatically estimated on-the-fly so that the whole decoding
pipeline is more simple and compact. Experimental results showed that this
method can achieve the same quality as when the input language is specified. In
addition, we present to design an English and Vietnamese End-to-End model to
deal with not only the problem of cross-lingual speakers but also as a solution
to improve the accuracy of borrowed words of English in Vietnamese.
Related papers
- Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting [45.161909551392085]
We introduce an encoder prompting technique within the self-conditioned CTC framework, enabling language-specific adaptation of the CTC model in a zero-shot manner.
Our method has shown to significantly reduce errors by 28% on average and by 41% on low-resource languages.
arXiv Detail & Related papers (2024-06-18T13:38:58Z) - Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation [40.0365339798752]
Many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language.
In some cases, the input language can be given or estimated.
We accomplish this by introducing a simple and effective linear input network.
arXiv Detail & Related papers (2024-06-12T00:00:39Z) - Gujarati-English Code-Switching Speech Recognition using ensemble
prediction of spoken language [29.058108207186816]
We propose two methods of introducing language specific parameters and explainability in the multi-head attention mechanism.
Despite being unable to reduce WER significantly, our method shows promise in predicting the correct language from just spoken data.
arXiv Detail & Related papers (2024-03-12T18:21:20Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z) - Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified
Multilingual Prompt [98.26682501616024]
We propose a novel model that uses a unified prompt for all languages, called UniPrompt.
The unified prompt is computation by a multilingual PLM to produce language-independent representation.
Our proposed methods can significantly outperform the strong baselines across different languages.
arXiv Detail & Related papers (2022-02-23T11:57:52Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.