A Dual-Decoder Conformer for Multilingual Speech Recognition
- URL: http://arxiv.org/abs/2109.03277v1
- Date: Sun, 22 Aug 2021 09:22:28 GMT
- Title: A Dual-Decoder Conformer for Multilingual Speech Recognition
- Authors: Krishna D N
- Abstract summary: This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages.
We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information.
Our experiments show that we can obtain a significant reduction in WER over the baseline approaches.
- Score: 4.594159253008448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based models have recently become very popular for
sequence-to-sequence applications such as machine translation and speech
recognition. This work proposes a dual-decoder transformer model for
low-resource multilingual speech recognition for Indian languages. Our proposed
model consists of a Conformer [1] encoder, two parallel transformer decoders,
and a language classifier. We use a phoneme decoder (PHN-DEC) for the phoneme
recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence
along with language information. We consider phoneme recognition and language
identification as auxiliary tasks in the multi-task learning framework. We
jointly optimize the network for phoneme recognition, grapheme recognition, and
language identification tasks with Joint CTC-Attention [2] training. Our
experiments show that we can obtain a significant reduction in WER over the
baseline approaches. We also show that our dual-decoder approach obtains
significant improvement over the single decoder approach.
Related papers
- Online Gesture Recognition using Transformer and Natural Language
Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Multilingual Speech Recognition for Low-Resource Indian Languages using
Multi-Task conformer [4.594159253008448]
We propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages.
We use a phoneme decoder for the phoneme recognition task and a grapheme decoder to predict grapheme sequence.
Our proposed approach can obtain significant improvement over previous approaches.
arXiv Detail & Related papers (2021-08-22T09:32:15Z) - Transformer-Transducers for Code-Switched Speech Recognition [23.281314397784346]
We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
arXiv Detail & Related papers (2020-11-30T17:27:41Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.