Related papers: Improving Language Identification for Multilingual Speakers

Improving Language Identification for Multilingual Speakers

URL: http://arxiv.org/abs/2001.11019v1
Date: Wed, 29 Jan 2020 18:58:11 GMT
Title: Improving Language Identification for Multilingual Speakers
Authors: Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young and Arnab Ghoshal
Abstract summary: Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. We show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present.
Score: 12.032095029281441
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present. We address this by using coarser-grained targets for the acoustic LID model and integrating its outputs with interaction context signals in a context-aware model to tailor the system to each user. This combined system achieves an average 97% accuracy across all language combinations while improving worst-case accuracy by over 60% relative to our baseline.

Related papers

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages [0.43498389175652036]
This study integrates traditional and novel language models with fine-tuned Whisper models to raise their performance in less commonly studied languages. We demonstrate substantial improvements in word error rate, particularly in low-resource scenarios. While the integration reliably benefits all model sizes, the extent of improvement varies, highlighting the importance of optimized language model parameters.
arXiv Detail & Related papers (2025-03-30T18:03:52Z)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively. However, Whisper struggles with unseen languages, those not included in its pre-training. We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech [3.812148920168377]
We propose a cascaded system consisting of speaker diarization and language identification. Results show that the proposed system often achieves lower language classification and language diarization error rates. At the same time does not negatively affect speech recognition on monolingual audio.
arXiv Detail & Related papers (2024-06-13T16:27:56Z)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z)
Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition [31.575930914290762]
A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form. A relative improvement of 8% over monolingual counterpart is achieved.
arXiv Detail & Related papers (2022-07-07T15:55:41Z)
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z)
Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific. We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID) We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z)
Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities. This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z)
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z)
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.