Speech Self-Supervised Representations Benchmarking: a Case for Larger
Probing Heads
- URL: http://arxiv.org/abs/2308.14456v2
- Date: Wed, 21 Feb 2024 16:57:23 GMT
- Title: Speech Self-Supervised Representations Benchmarking: a Case for Larger
Probing Heads
- Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco
Ravanelli
- Abstract summary: Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.
This study examines how benchmarking results are affected by changes in the probing head architecture.
- Score: 32.45539981205672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) leverages large datasets of unlabeled speech
to reach impressive performance with reduced amounts of annotated data. The
high number of proposed approaches fostered the emergence of comprehensive
benchmarks that evaluate their performance on a set of downstream tasks
exploring various aspects of the speech signal. However, while the number of
considered tasks has been growing, most proposals rely upon a single downstream
architecture that maps the frozen SSL representations to the task labels. This
study examines how benchmarking results are affected by changes in the probing
head architecture. Interestingly, we found that altering the downstream
architecture structure leads to significant fluctuations in the performance
ranking of the evaluated models. Against common practices in speech SSL
benchmarking, we evaluate larger-capacity probing heads, showing their impact
on performance, inference costs, generalization and multi-level feature
exploitation.
Related papers
- On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning [55.265138447400744]
Statement-Tuning is a technique that models discriminative tasks as a set of finite statements and trains an encoder model to discriminate between the potential statements to determine the label.
Experimental results demonstrate that Statement-Tuning achieves competitive performance compared to state-of-the-art LLMs with significantly fewer parameters.
The study investigates the impact of several design choices on few-shot and zero-shot generalization, revealing that Statement-Tuning can achieve strong performance with modest training data.
arXiv Detail & Related papers (2024-04-19T14:05:03Z) - Speech Self-Supervised Representation Benchmarking: Are We Doing it
Right? [24.354848095744536]
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks.
benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
arXiv Detail & Related papers (2023-06-01T08:51:18Z) - Deciphering the Projection Head: Representation Evaluation
Self-supervised Learning [6.375931203397043]
Self-supervised learning (SSL) aims to learn intrinsic features without labels.
Projection head always plays an important role in improving the performance of the downstream task.
We propose a Representation Evaluation Design (RED) in SSL models in which a shortcut connection between the representation and the projection vectors is built.
arXiv Detail & Related papers (2023-01-28T13:13:53Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - SUPERB: Speech processing Universal PERformance Benchmark [78.41287216481203]
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV)
SuperB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks.
We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model.
arXiv Detail & Related papers (2021-05-03T17:51:09Z) - Evaluating the Impact of a Hierarchical Discourse Representation on
Entity Coreference Resolution Performance [3.7277082975620797]
In this work, we leverage automatically constructed discourse parse trees within a neural approach.
We demonstrate a significant improvement on two benchmark entity coreference-resolution datasets.
arXiv Detail & Related papers (2021-04-20T19:14:57Z) - Towards Understanding Sample Variance in Visually Grounded Language
Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation.
Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance?
We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Towards Learning a Universal Non-Semantic Representation of Speech [18.54874934311111]
This paper proposes a benchmark for comparing speech representations on non-semantic tasks, and proposes a representation based on an unsupervised triplet-loss objective.
The proposed representation outperforms other representations on the benchmark, and even exceeds state-of-the-art performance on a number of transfer learning tasks.
arXiv Detail & Related papers (2020-02-25T21:38:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.