Improving Language Identification for Multilingual Speakers
- URL: http://arxiv.org/abs/2001.11019v1
- Date: Wed, 29 Jan 2020 18:58:11 GMT
- Title: Improving Language Identification for Multilingual Speakers
- Authors: Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young and
Arnab Ghoshal
- Abstract summary: Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language.
One aspect that has been mostly neglected is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies.
We show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present.
- Score: 12.032095029281441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language identification (LID) technologies have improved in recent
years from discriminating largely distinct languages to discriminating highly
similar languages or even dialects of the same language. One aspect that has
been mostly neglected, however, is discrimination of languages for multilingual
speakers, despite being a primary target audience of many systems that utilize
LID technologies. As we show in this work, LID systems can have a high average
accuracy for most combinations of languages while greatly underperforming for
others when accented speech is present. We address this by using
coarser-grained targets for the acoustic LID model and integrating its outputs
with interaction context signals in a context-aware model to tailor the system
to each user. This combined system achieves an average 97% accuracy across all
language combinations while improving worst-case accuracy by over 60% relative
to our baseline.
Related papers
- Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively.
However, Whisper struggles with unseen languages, those not included in its pre-training.
We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech [3.812148920168377]
We propose a cascaded system consisting of speaker diarization and language identification.
Results show that the proposed system often achieves lower language classification and language diarization error rates.
At the same time does not negatively affect speech recognition on monolingual audio.
arXiv Detail & Related papers (2024-06-13T16:27:56Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.