MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
- URL: http://arxiv.org/abs/2305.19011v3
- Date: Tue, 14 Nov 2023 21:22:25 GMT
- Title: MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
- Authors: Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston Hsu, Hung-yi Lee
- Abstract summary: SuperB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks.
SuperB incurs high computational costs due to the large datasets and diverse tasks.
We introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.
- Score: 90.99663022952498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SUPERB was proposed to evaluate the generalizability of self-supervised
learning (SSL) speech models across various tasks. However, it incurs high
computational costs due to the large datasets and diverse tasks. In this paper,
we introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL
speech models with comparable results to SUPERB but lower computational costs
significantly. We carefully select representative tasks, sample datasets, and
extract model representations offline. Our approach achieves a Spearman's rank
correlation of 0.954 and 0.982 with SUPERB Paper and SUPERB Challenge,
respectively. Additionally, we reduce the computational cost by 97% in terms of
Multiply-ACcumulate operations (MACs). Furthermore, we evaluate SSL speech
models in few-shot scenarios and observe significant variations in their
performance. To our knowledge, this is the first study to examine both the
computational cost of the model itself and the cost of evaluating it on a
benchmark.
Related papers
- Efficient Training of Self-Supervised Speech Foundation Models on a
Compute Budget [57.807614181024114]
This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget.
We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size.
arXiv Detail & Related papers (2024-09-09T10:36:42Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs)
We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants.
We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z) - Model Extraction Attack against Self-supervised Speech Models [52.81330435990717]
Self-supervised learning (SSL) speech models generate meaningful representations of given clips.
Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access.
We study the MEA problem against SSL speech model with a small number of queries.
arXiv Detail & Related papers (2022-11-29T09:28:05Z) - Application of Knowledge Distillation to Multi-task Speech
Representation Learning [2.0908300719428228]
Speech representation learning models use a large number of parameters, the smallest version of which has 95 million parameters.
In this paper, we investigate the application of knowledge distillation to speech representation learning models followed by fine-tuning.
Our approach results in nearly 75% reduction in model size while suffering only 0.1% accuracy and 0.9% equal error rate degradation.
arXiv Detail & Related papers (2022-10-29T14:22:43Z) - SUPERB: Speech processing Universal PERformance Benchmark [78.41287216481203]
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV)
SuperB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks.
We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model.
arXiv Detail & Related papers (2021-05-03T17:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.