Speech Self-Supervised Representation Benchmarking: Are We Doing it
Right?
- URL: http://arxiv.org/abs/2306.00452v1
- Date: Thu, 1 Jun 2023 08:51:18 GMT
- Title: Speech Self-Supervised Representation Benchmarking: Are We Doing it
Right?
- Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco
Ravanelli
- Abstract summary: Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks.
benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
- Score: 24.354848095744536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has recently allowed leveraging large datasets
of unlabeled speech signals to reach impressive performance on speech tasks
using only small amounts of annotated data. The high number of proposed
approaches fostered the need and rise of extended benchmarks that evaluate
their performance on a set of downstream tasks exploring various aspects of the
speech signal. However, and while the number of considered tasks has been
growing, most rely upon a single decoding architecture that maps the frozen SSL
representations to the downstream labels. This work investigates the robustness
of such benchmarking results to changes in the decoder architecture.
Interestingly, it appears that varying the architecture of the downstream
decoder leads to significant variations in the leaderboards of most tasks.
Concerningly, our study reveals that benchmarking using limited decoders may
cause a counterproductive increase in the sizes of the developed SSL models.
Related papers
- DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding [51.32965203977845]
We propose the use of discrete speech units (DSU) instead of continuous-valued speech encoder outputs.
The proposed model shows robust performance on speech inputs from seen/unseen domains and instruction-following capability in spoken question answering.
Our findings suggest that the ASR task and datasets are not crucial in instruction-tuning for spoken question answering tasks.
arXiv Detail & Related papers (2024-06-13T17:28:13Z) - Memorization in Self-Supervised Learning Improves Downstream Generalization [49.42010047574022]
Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data.
We propose SSLMem, a framework for defining memorization within SSL.
arXiv Detail & Related papers (2024-01-19T11:32:47Z) - Speech Self-Supervised Representations Benchmarking: a Case for Larger
Probing Heads [32.45539981205672]
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.
This study examines how benchmarking results are affected by changes in the probing head architecture.
arXiv Detail & Related papers (2023-08-28T09:49:48Z) - Deciphering the Projection Head: Representation Evaluation
Self-supervised Learning [6.375931203397043]
Self-supervised learning (SSL) aims to learn intrinsic features without labels.
Projection head always plays an important role in improving the performance of the downstream task.
We propose a Representation Evaluation Design (RED) in SSL models in which a shortcut connection between the representation and the projection vectors is built.
arXiv Detail & Related papers (2023-01-28T13:13:53Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Joint Encoder-Decoder Self-Supervised Pre-training for ASR [0.0]
Self-supervised learning has shown tremendous success in various speech-related downstream tasks.
In this paper, we propose a new paradigm that exploits the power of a decoder during self-supervised learning.
arXiv Detail & Related papers (2022-06-09T12:45:29Z) - LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing.
Recent works also investigated SSL from speech.
We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.