Towards Early Prediction of Self-Supervised Speech Model Performance
- URL: http://arxiv.org/abs/2501.05966v1
- Date: Fri, 10 Jan 2025 13:49:09 GMT
- Title: Towards Early Prediction of Self-Supervised Speech Model Performance
- Authors: Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève,
- Abstract summary: We propose unsupervised efficient methods that give insights into the quality of the pre-training of SSL speech models.
Results show that measures of cluster quality and rank correlate better with downstream performance than the pre-training loss with only one hour of unlabeled audio.
- Score: 23.934743358907895
- License:
- Abstract: In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge the final downstream performance in a cost efficient manner during pre-training. In this work, we propose unsupervised efficient methods that give insights into the quality of the pre-training of SSL speech models, namely, measuring the cluster quality and rank of the embeddings of the SSL model. Results show that measures of cluster quality and rank correlate better with downstream performance than the pre-training loss with only one hour of unlabeled audio, reducing the need for GPU hours and labeled data in SSL model evaluation.
Related papers
- A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - A Self-Supervised Learning Pipeline for Demographically Fair Facial Attribute Classification [3.5092955099876266]
This paper proposes a fully self-supervised pipeline for demographically fair facial attribute classification.
We leverage completely unlabeled data pseudolabeled via pre-trained encoders, diverse data curation techniques, and meta-learning-based weighted contrastive learning.
arXiv Detail & Related papers (2024-07-14T07:11:57Z) - Self-supervised visual learning in the low-data regime: a comparative evaluation [38.34785825702943]
Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs)
It allows efficient representation learning from massive amounts of unlabeled training data.
It is not always feasible to collect and/or to utilize very large pretraining datasets.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - Evaluating Fairness in Self-supervised and Supervised Models for
Sequential Data [10.626503137418636]
Self-supervised learning (SSL) has become the de facto training paradigm of large models.
This study explores the impact of pre-training and fine-tuning strategies on fairness.
arXiv Detail & Related papers (2024-01-03T09:31:43Z) - Stable Distillation: Regularizing Continued Pre-training for
Low-Resource Automatic Speech Recognition [54.9235160379917]
Stable Distillation is a simple and novel approach for SSL-based continued pre-training.
It boosts ASR performance in the target domain where both labeled and unlabeled data are limited.
arXiv Detail & Related papers (2023-12-20T06:02:12Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - Rethinking Self-Supervised Visual Representation Learning in
Pre-training for 3D Human Pose and Shape Estimation [57.206129938611454]
Self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection.
We empirically study and analyze the effects of SSL and compare it with other pre-training alternatives for 3DHPSE.
Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.
arXiv Detail & Related papers (2023-03-09T16:17:52Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Analyzing the factors affecting usefulness of Self-Supervised
Pre-trained Representations for Speech Recognition [1.0705399532413615]
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition systems.
We study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task.
arXiv Detail & Related papers (2022-03-31T11:48:24Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.