Label Aware Speech Representation Learning For Language Identification
- URL: http://arxiv.org/abs/2306.04374v1
- Date: Wed, 7 Jun 2023 12:14:16 GMT
- Title: Label Aware Speech Representation Learning For Language Identification
- Authors: Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna,
Min Ma, Wei Han, Vera Axelrod, Partha Talukdar
- Abstract summary: We propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.
This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function.
- Score: 49.197215416945596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech representation learning approaches for non-semantic tasks such as
language recognition have either explored supervised embedding extraction
methods using a classifier model or self-supervised representation learning
approaches using raw data. In this paper, we propose a novel framework of
combining self-supervised representation learning with the language label
information for the pre-training task. This framework, termed as Label Aware
Speech Representation (LASR) learning, uses a triplet based objective function
to incorporate language labels along with the self-supervised loss function.
The speech representations are further fine-tuned for the downstream task. The
language recognition experiments are performed on two public datasets - FLEURS
and Dhwani. In these experiments, we illustrate that the proposed LASR
framework improves over the state-of-the-art systems on language
identification. We also report an analysis of the robustness of LASR approach
to noisy/missing labels as well as its application to multi-lingual speech
recognition tasks.
Related papers
- Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - MASR: Multi-label Aware Speech Representation [36.2978180342839]
We propose MASR, a Multi-label Aware Speech Representation learning framework.
MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information.
We show significant performance improvements for the MASR over other established benchmarks.
arXiv Detail & Related papers (2023-07-20T16:09:57Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z) - Towards Relevance and Sequence Modeling in Language Recognition [39.547398348702025]
We propose a neural network framework utilizing short-sequence information in language recognition.
A new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task.
Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data.
arXiv Detail & Related papers (2020-04-02T18:31:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.