Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes
- URL: http://arxiv.org/abs/2110.07909v1
- Date: Fri, 15 Oct 2021 07:50:27 GMT
- Title: Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes
- Authors: Rimita Lahiri, Kenichi Kumatani, Eric Sun and Yao Qian
- Abstract summary: Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER.
A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
- Score: 15.927513451432946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual end-to-end(E2E) models have shown a great potential in the
expansion of the language coverage in the realm of automatic speech
recognition(ASR). In this paper, we aim to enhance the multilingual ASR
performance in two ways, 1)studying the impact of feeding a one-hot vector
identifying the language, 2)formulating the task with a meta-learning objective
combined with self-supervised learning (SSL). We associate every language with
a distinct task manifold and attempt to improve the performance by transferring
knowledge across learning processes itself as compared to transferring through
final model parameters. We employ this strategy on a dataset comprising of 6
languages for an in-domain ASR task, by minimizing an objective related to
expected gradient path length. Experimental results reveal the best
pre-training strategy resulting in 3.55% relative reduction in overall WER. A
combination of LEAP and SSL yields 3.51% relative reduction in overall WER when
using language ID.
Related papers
- Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages [24.856817602140193]
This study focuses on two endangered Austronesian languages, Amis and Seediq.
We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
arXiv Detail & Related papers (2024-09-13T14:35:47Z) - Generative linguistic representation for spoken language identification [17.9575874225144]
We explore the utilization of the decoder-based network from the Whisper model to extract linguistic features.
We devised two strategies - one based on the language embedding method and the other focusing on direct optimization of LID outputs.
We conducted experiments on the large-scale multilingual datasets MLS, VoxLingua107, and CommonVoice to test our approach.
arXiv Detail & Related papers (2023-12-18T06:40:24Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Persian Natural Language Inference: A Meta-learning approach [6.832341432995628]
This paper proposes a meta-learning approach for inferring natural language in Persian.
We evaluate the proposed method using four languages and an auxiliary task.
arXiv Detail & Related papers (2022-05-18T06:51:58Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - On Learning Universal Representations Across Languages [37.555675157198145]
We extend existing approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation.
Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to learn universal representations for parallel sentences distributed in one or multiple languages.
We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation.
arXiv Detail & Related papers (2020-07-31T10:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.