KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian
Languages Code-Switching Challenge
- URL: http://arxiv.org/abs/2106.05885v1
- Date: Thu, 10 Jun 2021 16:12:51 GMT
- Title: KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian
Languages Code-Switching Challenge
- Authors: Amir Hussein, Shammur Chowdhury, Ahmed Ali
- Abstract summary: We present the Kanari/QCRI system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages.
The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English.
To tackle the CS challenges, we use transfer learning for incorporating the publicly available monolingual Hindi, Bengali, and English speech data.
- Score: 7.711092265101041
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we present the Kanari/QCRI (KARI) system and the modeling
strategies used to participate in the Interspeech 2021 Code-switching (CS)
challenge for low-resource Indian languages. The subtask involved developing a
speech recognition system for two CS datasets: Hindi-English and
Bengali-English, collected in a real-life scenario. To tackle the CS
challenges, we use transfer learning for incorporating the publicly available
monolingual Hindi, Bengali, and English speech data. In this work, we study the
effectiveness of two steps transfer learning protocol for low-resourced CS
data: monolingual pretraining, followed by fine-tuning. For acoustic modeling,
we develop an end-to-end convolution-augmented transformer (Conformer). We show
that selecting the percentage of each monolingual data affects model biases
towards using one language character set over the other in a CS scenario. The
models pretrained on well-aligned and accurate monolingual data showed
robustness against misalignment between the segments and the transcription.
Finally, we develop word-level n-gram language models (LM) to rescore ASR
recognition.
Related papers
- CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving [61.73180469072787]
We focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text.
We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules.
COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.
arXiv Detail & Related papers (2024-06-16T16:10:51Z) - MunTTS: A Text-to-Speech System for Mundari [18.116359188623832]
We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.
Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system.
arXiv Detail & Related papers (2024-01-28T06:27:17Z) - Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - End-to-End Speech Translation for Code Switched Speech [13.97982457879585]
Code switching (CS) refers to the phenomenon of interchangeably using words and phrases from different languages.
We focus on CS in the context of English/Spanish conversations for the task of speech translation (ST), generating and evaluating both transcript and translation.
We show that our ST architectures, and especially our bidirectional end-to-end architecture, perform well on CS speech, even when no CS training data is used.
arXiv Detail & Related papers (2022-04-11T13:25:30Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - The ASRU 2019 Mandarin-English Code-Switching Speech Recognition
Challenge: Open Datasets, Tracks, Methods and Results [9.089285414356969]
This paper describes the design and main outcomes of the ASRU 2019 Mandarin-English code-switching speech recognition challenge.
500 hours Mandarin speech data and 240 hours Mandarin-English intra-sentencial CS data are released to the participants.
arXiv Detail & Related papers (2020-07-12T05:38:57Z) - Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities.
We present a novel vantage point of CS to be style variations between both the participating languages.
We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.