Improving Massively Multilingual ASR With Auxiliary CTC Objectives
- URL: http://arxiv.org/abs/2302.12829v2
- Date: Mon, 27 Feb 2023 17:47:31 GMT
- Title: Improving Massively Multilingual ASR With Auxiliary CTC Objectives
- Authors: William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji
Watanabe
- Abstract summary: We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
- Score: 40.10307386370194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual Automatic Speech Recognition (ASR) models have extended the
usability of speech technologies to a wide variety of languages. With how many
languages these models have to handle, however, a key to understanding their
imbalanced performance across different languages is to examine if the model
actually knows which language it should transcribe. In this paper, we introduce
our work on improving performance on FLEURS, a 102-language open ASR benchmark,
by conditioning the entire model on language identity (LID). We investigate
techniques inspired from recent Connectionist Temporal Classification (CTC)
studies to help the model handle the large number of languages, conditioning on
the LID predictions of auxiliary tasks. Our experimental results demonstrate
the effectiveness of our technique over standard CTC/Attention-based hybrid
models. Furthermore, our state-of-the-art systems using self-supervised models
with the Conformer architecture improve over the results of prior work on
FLEURS by a relative 28.4% CER. Trained models and reproducible recipes are
available at https://github.com/espnet/espnet/tree/master/egs2/fleurs/asr1 .
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Efficient Compression of Multitask Multilingual Speech Models [0.0]
DistilWhisper is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
arXiv Detail & Related papers (2024-05-02T03:11:59Z) - Efficient Adapter Finetuning for Tail Languages in Streaming
Multilingual ASR [44.949146169903074]
The heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation.
Our proposed method brings 12.2% word error rate reduction on average and up to 37.5% on a single locale.
arXiv Detail & Related papers (2024-01-17T06:01:16Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages.
We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z) - EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning [38.928786416891924]
We introduce efficient and effective massively multilingual sentence embedding (EMS) using cross-lingual token-level reconstruction (XTR) and sentence-level contrastive learning as training objectives.
Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources.
We release the codes for model training and the EMS pre-trained sentence embedding model, which supports 62 languages.
arXiv Detail & Related papers (2022-05-31T12:29:25Z) - Incorporating Linguistic Knowledge for Abstractive Multi-document
Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model.
We process the dependency information into the linguistic-guided attention mechanism.
With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.