Low-Resource Spoken Language Identification Using Self-Attentive Pooling
and Deep 1D Time-Channel Separable Convolutions
- URL: http://arxiv.org/abs/2106.00052v1
- Date: Mon, 31 May 2021 18:35:27 GMT
- Title: Low-Resource Spoken Language Identification Using Self-Attentive Pooling
and Deep 1D Time-Channel Separable Convolutions
- Authors: Roman Bedyakin, Nikolay Mikhaylovskiy
- Abstract summary: We show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task.
We also substantiate the hypothesis that whenever the dataset is diverse enough so that the other classification factors, like gender, age etc. are well-averaged, the confusion matrix for LID system bears the language similarity measure.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This memo describes NTR/TSU winning submission for Low Resource ASR challenge
at Dialog2021 conference, language identification track.
Spoken Language Identification (LID) is an important step in a multilingual
Automated Speech Recognition (ASR) system pipeline. Traditionally, the ASR task
requires large volumes of labeled data that are unattainable for most of the
world's languages, including most of the languages of Russia. In this memo, we
show that a convolutional neural network with a Self-Attentive Pooling layer
shows promising results in low-resource setting for the language identification
task and set up a SOTA for the Low Resource ASR challenge dataset.
Additionally, we compare the structure of confusion matrices for this and
significantly more diverse VoxForge dataset and state and substantiate the
hypothesis that whenever the dataset is diverse enough so that the other
classification factors, like gender, age etc. are well-averaged, the confusion
matrix for LID system bears the language similarity measure.
Related papers
- Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - MRN: Multiplexed Routing Network for Incremental Multilingual Text
Recognition [56.408324994409405]
Multiplexed routing network (MRN) trains a recognizer for each language that is currently seen.
MRN effectively reduces the reliance on older data and better fights against catastrophic forgetting.
It outperforms existing general-purpose IL methods by large margins.
arXiv Detail & Related papers (2023-05-24T06:03:34Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - Language ID Prediction from Speech Using Self-Attentive Pooling and
1D-Convolutions [0.0]
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech.
For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems.
We show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
arXiv Detail & Related papers (2021-04-24T16:41:17Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Adversarial Meta Sampling for Multilingual Low-Resource Speech
Recognition [159.9312272042253]
We develop a novel adversarial meta sampling (AMS) approach to improve multilingual meta-learning ASR (MML-ASR)
AMS adaptively determines the task sampling probability for each source language.
Experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR.
arXiv Detail & Related papers (2020-12-22T09:33:14Z) - Streaming End-to-End Bilingual ASR Systems with Joint Language
Identification [19.09014345299161]
We introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification.
The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India.
arXiv Detail & Related papers (2020-07-08T05:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.