SIGTYP 2021 Shared Task: Robust Spoken Language Identification
- URL: http://arxiv.org/abs/2106.03895v1
- Date: Mon, 7 Jun 2021 18:12:27 GMT
- Title: SIGTYP 2021 Shared Task: Robust Spoken Language Identification
- Authors: Elizabeth Salesky, Badr M. Abdullah, Sabrina J. Mielke, Elena
Klyachko, Oleg Serikov, Edoardo Ponti, Ritesh Kumar, Ryan Cotterell,
Ekaterina Vylomova
- Abstract summary: Many low-resource and endangered languages may be single-speaker or have different domains than desired application scenarios.
This year's shared task on robust spoken language identification sought to investigate just this scenario.
We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain.
- Score: 33.517587041976356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While language identification is a fundamental speech and language processing
task, for many languages and language families it remains a challenging task.
For many low-resource and endangered languages this is in part due to resource
availability: where larger datasets exist, they may be single-speaker or have
different domains than desired application scenarios, demanding a need for
domain and speaker-invariant language identification systems. This year's
shared task on robust spoken language identification sought to investigate just
this scenario: systems were to be trained on largely single-speaker speech from
one domain, but evaluated on data in other domains recorded from speakers under
different recording circumstances, mimicking realistic low-resource scenarios.
We see that domain and speaker mismatch proves very challenging for current
methods which can perform above 95% accuracy in-domain, which domain adaptation
can address to some degree, but that these conditions merit further
investigation to make spoken language identification accessible in many
scenarios.
Related papers
- Multilingual acoustic word embeddings for zero-resource languages [1.5229257192293204]
It specifically uses acoustic word embedding (AWE) -- fixed-dimensional representations of variable-duration speech segments.
The study introduces a new neural network that outperforms existing AWE models on zero-resource languages.
AWEs are applied to a keyword-spotting system for hate speech detection in Swahili radio broadcasts.
arXiv Detail & Related papers (2024-01-19T08:02:37Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Model Adaptation for ASR in low-resource Indian Languages [28.02064068964355]
Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models like wav2vec2 and large-scale multi-lingual training like Whisper.
A huge challenge still exists for low-resource languages where the availability of both audio and text is limited.
This is where a lot of adaptation and fine-tuning techniques can be applied to overcome the low-resource nature of the data by utilising well-resourced similar languages.
It could be the case that an abundance of acoustic data in a language reduces the need for large text-only corpora
arXiv Detail & Related papers (2023-07-16T05:25:51Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Language ID Prediction from Speech Using Self-Attentive Pooling and
1D-Convolutions [0.0]
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech.
For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems.
We show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
arXiv Detail & Related papers (2021-04-24T16:41:17Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine
Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT)
The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone.
We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z) - Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech
Recognition in Real-World Applications -- A Case Study on German Oral History
Interviews [21.47857960919014]
We propose an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner.
Our approach enables the exploitation of large-scale training data from other domains in both the same and other languages.
arXiv Detail & Related papers (2020-05-26T08:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.