ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
- URL: http://arxiv.org/abs/2305.10615v2
- Date: Fri, 11 Aug 2023 17:39:21 GMT
- Title: ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
- Authors: Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei
Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee,
Shinji Watanabe
- Abstract summary: Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.
This paper presents multilingual SUPERB, covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification.
Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features.
- Score: 73.65853301350042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard
to benchmark the performance of Self-Supervised Learning (SSL) models on
various speech processing tasks. However, SUPERB largely considers English
speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB),
covering 143 languages (ranging from high-resource to endangered), and
considering both automatic speech recognition and language identification.
Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and
employs a simple framework for multilingual tasks by learning a shallow
downstream model. Similar to the SUPERB benchmark, we find speech SSL models
can significantly improve performance compared to FBANK features. Furthermore,
we find that multilingual models do not always perform better than their
monolingual counterparts. We will release ML-SUPERB as a challenge with
organized datasets and reproducible training scripts for future multilingual
representation research.
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
Models via Language-Specific Experts [14.999359332108767]
We propose DistilWhisper to bridge the performance gap in ASR for under-represented languages.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters.
arXiv Detail & Related papers (2023-11-02T08:37:30Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - SUPERB: Speech processing Universal PERformance Benchmark [78.41287216481203]
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV)
SuperB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks.
We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model.
arXiv Detail & Related papers (2021-05-03T17:51:09Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date.
We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.