Geolocation-Aware Robust Spoken Language Identification
- URL: http://arxiv.org/abs/2508.17148v1
- Date: Sat, 23 Aug 2025 21:49:27 GMT
- Title: Geolocation-Aware Robust Spoken Language Identification
- Authors: Qingzheng Wang, Hye-jin Shim, Jiancheng Sun, Shinji Watanabe,
- Abstract summary: geolocation-aware LID is a novel approach that incorporates language-level geolocation information into the SSL-based LID model.<n>We introduce geolocation prediction as an auxiliary task and inject the predicted vectors into intermediate representations as conditioning signals.<n> Experiments across six multilingual datasets demonstrate that our approach improves robustness to intra-language variations and unseen domains.
- Score: 46.38730484136585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Self-supervised Learning (SSL) has significantly improved Spoken Language Identification (LID), existing models often struggle to consistently classify dialects and accents of the same language as a unified class. To address this challenge, we propose geolocation-aware LID, a novel approach that incorporates language-level geolocation information into the SSL-based LID model. Specifically, we introduce geolocation prediction as an auxiliary task and inject the predicted vectors into intermediate representations as conditioning signals. This explicit conditioning encourages the model to learn more unified representations for dialectal and accented variations. Experiments across six multilingual datasets demonstrate that our approach improves robustness to intra-language variations and unseen domains, achieving new state-of-the-art accuracy on FLEURS (97.7%) and 9.7% relative improvement on ML-SUPERB 2.0 dialect set.
Related papers
- Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics [56.145578792496714]
Large language models (LLMs) struggle with cross-lingual knowledge transfer.<n>We study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets.
arXiv Detail & Related papers (2025-08-14T18:44:13Z) - ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework [78.07201802874529]
ShifCon is a Shift-based multilingual Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.<n>Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages [24.856817602140193]
This study focuses on two endangered Austronesian languages, Amis and Seediq.
We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
arXiv Detail & Related papers (2024-09-13T14:35:47Z) - Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models [69.59613095232598]
We propose adaptation methods which integrate LoRA to existed SSL models to extend new language.<n>We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages.
arXiv Detail & Related papers (2024-06-20T08:13:30Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Generative linguistic representation for spoken language identification [17.9575874225144]
We explore the utilization of the decoder-based network from the Whisper model to extract linguistic features.
We devised two strategies - one based on the language embedding method and the other focusing on direct optimization of LID outputs.
We conducted experiments on the large-scale multilingual datasets MLS, VoxLingua107, and CommonVoice to test our approach.
arXiv Detail & Related papers (2023-12-18T06:40:24Z) - Geographic Adaptation of Pretrained Language Models [29.81557992080902]
We introduce geoadaptation, an intermediate training step that couples language modeling with geolocation prediction in a multi-task learning setup.
We show that the effectiveness of geoadaptation stems from its ability to geographically retrofit the representation space of the pretrained language models.
arXiv Detail & Related papers (2022-03-16T11:55:00Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Cross-Domain Adaptation of Spoken Language Identification for Related
Languages: The Curious Case of Slavic Languages [17.882477802269243]
We present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems.
We show that out-of-domain speech samples severely hinder the performance of neural LID models.
We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.
arXiv Detail & Related papers (2020-08-02T19:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.