LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech
- URL: http://arxiv.org/abs/2104.11462v1
- Date: Fri, 23 Apr 2021 08:27:09 GMT
- Title: LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech
- Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima
Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli,
Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux,
Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab and Laurent
Besacier
- Abstract summary: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing.
Recent works also investigated SSL from speech.
We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
- Score: 63.84741259993937
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been
successfully explored for image and natural language processing. Recent works
also investigated SSL from speech. They were notably successful to improve
performance on downstream tasks such as automatic speech recognition (ASR).
While these works suggest it is possible to reduce dependence on labeled data
for building efficient speech systems, their evaluation was mostly made on ASR
and using multiple and heterogeneous experimental settings (most of them for
English). This renders difficult the objective comparison between SSL
approaches and the evaluation of their impact on building speech systems. In
this paper, we propose LeBenchmark: a reproducible framework for assessing SSL
from speech. It not only includes ASR (high and low resource) tasks but also
spoken language understanding, speech translation and emotion recognition. We
also target speech technologies in a language different than English: French.
SSL models of different sizes are trained from carefully sourced and documented
datasets. Experiments show that SSL is beneficial for most but not all tasks
which confirms the need for exhaustive and reliable benchmarks to evaluate its
real impact. LeBenchmark is shared with the scientific community for
reproducible research in SSL from speech.
Related papers
- Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect [11.013934239276036]
Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks.
This paper contributes by comparing the effectiveness of SSL approaches in the context of the low-resource spoken Tunisian Arabic dialect.
arXiv Detail & Related papers (2024-07-05T14:21:36Z) - SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? [45.901645659694935]
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks.
In this paper, we aim to clarify if speech SSL techniques can well capture linguistic knowledge.
arXiv Detail & Related papers (2023-06-14T09:04:29Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
Recognition? [86.53044183309824]
We study which factor leads to the success of self-supervised learning on speaker-related tasks.
Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size.
arXiv Detail & Related papers (2022-04-27T08:35:57Z) - Combining Spectral and Self-Supervised Features for Low Resource Speech
Recognition and Translation [27.857955394020475]
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks.
The quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain.
We propose a learnable and interpretable framework to combine SF and SSL representations.
arXiv Detail & Related papers (2022-04-05T20:09:15Z) - Analyzing the factors affecting usefulness of Self-Supervised
Pre-trained Representations for Speech Recognition [1.0705399532413615]
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition systems.
We study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task.
arXiv Detail & Related papers (2022-03-31T11:48:24Z) - Audio Self-supervised Learning: A Survey [60.41768569891083]
Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations.
Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing.
arXiv Detail & Related papers (2022-03-02T15:58:29Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.