Exploiting Spectral Augmentation for Code-Switched Spoken Language
Identification
- URL: http://arxiv.org/abs/2010.07130v1
- Date: Wed, 14 Oct 2020 14:37:03 GMT
- Title: Exploiting Spectral Augmentation for Code-Switched Spoken Language
Identification
- Authors: Pradeep Rangan, Sundeep Teki, and Hemant Misra
- Abstract summary: We perform spoken LID on three Indian languages code-mixed with English.
This task was organized by the Microsoft research team as a spoken LID challenge.
- Score: 2.064612766965483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language Identification (LID) systems are needed to identify the
language(s) present in a given audio sample, and typically could be the first
step in many speech processing related tasks such as automatic speech
recognition (ASR). Automatic identification of the languages present in a
speech signal is not only scientifically interesting, but also of practical
importance in a multilingual country such as India. In many of the Indian
cities, when people interact with each other, as many as three languages may
get mixed. These may include the official language of that province, Hindi and
English (at times the languages of the neighboring provinces may also get mixed
during these interactions). This makes the spoken LID task extremely
challenging in Indian context. While quite a few LID systems in the context of
Indian languages have been implemented, most such systems have used small scale
speech data collected internally within an organization. In the current work,
we perform spoken LID on three Indian languages (Gujarati, Telugu, and Tamil)
code-mixed with English. This task was organized by the Microsoft research team
as a spoken LID challenge. In our work, we modify the usual spectral
augmentation approach and propose a language mask that discriminates the
language ID pairs, which leads to a noise robust spoken LID system. The
proposed method gives a relative improvement of approximately 3-5% in the LID
accuracy over a baseline system proposed by Microsoft on the three language
pairs for two shared tasks suggested in the challenge.
Related papers
- Simple yet Effective Code-Switching Language Identification with
Multitask Pre-Training and Transfer Learning [0.7242530499990028]
Code-switching is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance.
We propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset.
Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.
arXiv Detail & Related papers (2023-05-31T11:43:16Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Adversarial synthesis based data-augmentation for code-switched spoken
language identification [0.0]
Spoken Language Identification (LID) is an important sub-task of Automatic Speech Recognition (ASR)
This study focuses on Indic language code-mixed with English.
Generative Adversarial Network (GAN) based data augmentation technique performed using Mel spectrograms for audio data.
arXiv Detail & Related papers (2022-05-30T06:41:13Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.