Improving Language Identification for Multilingual Speakers
- URL: http://arxiv.org/abs/2001.11019v1
- Date: Wed, 29 Jan 2020 18:58:11 GMT
- Title: Improving Language Identification for Multilingual Speakers
- Authors: Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young and
Arnab Ghoshal
- Abstract summary: Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language.
One aspect that has been mostly neglected is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies.
We show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present.
- Score: 12.032095029281441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language identification (LID) technologies have improved in recent
years from discriminating largely distinct languages to discriminating highly
similar languages or even dialects of the same language. One aspect that has
been mostly neglected, however, is discrimination of languages for multilingual
speakers, despite being a primary target audience of many systems that utilize
LID technologies. As we show in this work, LID systems can have a high average
accuracy for most combinations of languages while greatly underperforming for
others when accented speech is present. We address this by using
coarser-grained targets for the acoustic LID model and integrating its outputs
with interaction context signals in a context-aware model to tailor the system
to each user. This combined system achieves an average 97% accuracy across all
language combinations while improving worst-case accuracy by over 60% relative
to our baseline.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech [3.812148920168377]
We propose a cascaded system consisting of speaker diarization and language identification.
Results show that the proposed system often achieves lower language classification and language diarization error rates.
At the same time does not negatively affect speech recognition on monolingual audio.
arXiv Detail & Related papers (2024-06-13T16:27:56Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition [31.575930914290762]
A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.
Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form.
A relative improvement of 8% over monolingual counterpart is achieved.
arXiv Detail & Related papers (2022-07-07T15:55:41Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.