Semi-supervised Development of ASR Systems for Multilingual
Code-switched Speech in Under-resourced Languages
- URL: http://arxiv.org/abs/2003.03135v1
- Date: Fri, 6 Mar 2020 11:08:38 GMT
- Title: Semi-supervised Development of ASR Systems for Multilingual
Code-switched Speech in Under-resourced Languages
- Authors: Astik Biswas, Emre Y{\i}lmaz, Febe de Wet, Ewald van der Westhuizen,
Thomas Niesler
- Abstract summary: Two approaches are considered for under-resourced, code-switched speech in five South African languages.
The first constructs four separate bilingual automatic speech recognisers corresponding to four different language pairs.
The second uses a single, unified, five-lingual ASR system that represents all the languages.
- Score: 19.569525304938033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper reports on the semi-supervised development of acoustic and
language models for under-resourced, code-switched speech in five South African
languages. Two approaches are considered. The first constructs four separate
bilingual automatic speech recognisers (ASRs) corresponding to four different
language pairs between which speakers switch frequently. The second uses a
single, unified, five-lingual ASR system that represents all the languages
(English, isiZulu, isiXhosa, Setswana and Sesotho). We evaluate the
effectiveness of these two approaches when used to add additional data to our
extremely sparse training sets. Results indicate that batch-wise
semi-supervised training yields better results than a non-batch-wise approach.
Furthermore, while the separate bilingual systems achieved better recognition
performance than the unified system, they benefited more from pseudo-labels
generated by the five-lingual system than from those generated by the bilingual
systems.
Related papers
- Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge [16.813582262700415]
The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities.
The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers.
arXiv Detail & Related papers (2024-06-22T10:49:36Z) - Hindi as a Second Language: Improving Visually Grounded Speech with
Semantically Similar Samples [89.16814518860357]
The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective.
Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language.
arXiv Detail & Related papers (2023-03-30T16:34:10Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Bilingual End-to-End ASR with Byte-Level Subwords [4.268218327369146]
We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE)
We focus on developing a single end-to-end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances.
We find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2% to 5% relative even with smaller number of outputs and fewer parameters.
arXiv Detail & Related papers (2022-05-01T15:01:01Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling [52.99188200886738]
BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling.
BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
arXiv Detail & Related papers (2021-06-05T03:38:42Z) - Dual Script E2E framework for Multilingual and Code-Switching ASR [4.697788649564087]
We train multilingual and code-switching ASR systems for Indian languages.
Inspired by results in text-to-speech synthesis, we use an in-house rule-based common label set ( CLS) representation.
We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021.
arXiv Detail & Related papers (2021-06-02T18:08:27Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.