Cross-Domain Adaptation of Spoken Language Identification for Related
Languages: The Curious Case of Slavic Languages
- URL: http://arxiv.org/abs/2008.00545v2
- Date: Fri, 7 Aug 2020 00:31:40 GMT
- Title: Cross-Domain Adaptation of Spoken Language Identification for Related
Languages: The Curious Case of Slavic Languages
- Authors: Badr M. Abdullah, Tania Avgustinova, Bernd M\"obius, Dietrich Klakow
- Abstract summary: We present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems.
We show that out-of-domain speech samples severely hinder the performance of neural LID models.
We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.
- Score: 17.882477802269243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art spoken language identification (LID) systems, which are
based on end-to-end deep neural networks, have shown remarkable success not
only in discriminating between distant languages but also between
closely-related languages or even different spoken varieties of the same
language. However, it is still unclear to what extent neural LID models
generalize to speech samples with different acoustic conditions due to domain
shift. In this paper, we present a set of experiments to investigate the impact
of domain mismatch on the performance of neural LID systems for a subset of six
Slavic languages across two domains (read speech and radio broadcast) and
examine two low-level signal descriptors (spectral and cepstral features) for
this task. Our experiments show that (1) out-of-domain speech samples severely
hinder the performance of neural LID models, and (2) while both spectral and
cepstral features show comparable performance within-domain, spectral features
show more robustness under domain mismatch. Moreover, we apply unsupervised
domain adaptation to minimize the discrepancy between the two domains in our
study. We achieve relative accuracy improvements that range from 9% to 77%
depending on the diversity of acoustic conditions in the source domain.
Related papers
- Generative linguistic representation for spoken language identification [17.9575874225144]
We explore the utilization of the decoder-based network from the Whisper model to extract linguistic features.
We devised two strategies - one based on the language embedding method and the other focusing on direct optimization of LID outputs.
We conducted experiments on the large-scale multilingual datasets MLS, VoxLingua107, and CommonVoice to test our approach.
arXiv Detail & Related papers (2023-12-18T06:40:24Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition [31.575930914290762]
A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.
Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form.
A relative improvement of 8% over monolingual counterpart is achieved.
arXiv Detail & Related papers (2022-07-07T15:55:41Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Is Attention always needed? A Case Study on Language Identification from
Speech [1.162918464251504]
The present study introduces convolutional recurrent neural network (CRNN) based LID.
CRNN based LID is designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples.
The LID model exhibits high-performance levels ranging from 97% to 100% for languages that are linguistically similar.
arXiv Detail & Related papers (2021-10-05T16:38:57Z) - Do Acoustic Word Embeddings Capture Phonological Similarity? An
Empirical Study [12.210797811981173]
In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity?
We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity.
Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity.
arXiv Detail & Related papers (2021-06-16T10:47:56Z) - SIGTYP 2021 Shared Task: Robust Spoken Language Identification [33.517587041976356]
Many low-resource and endangered languages may be single-speaker or have different domains than desired application scenarios.
This year's shared task on robust spoken language identification sought to investigate just this scenario.
We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain.
arXiv Detail & Related papers (2021-06-07T18:12:27Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z) - The Secret is in the Spectra: Predicting Cross-lingual Task Performance
with Spectral Similarity Measures [83.53361353172261]
We present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance.
We introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra.
We empirically show that 1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks.
arXiv Detail & Related papers (2020-01-30T00:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.