Conditional independence for pretext task selection in Self-supervised
speech representation learning
- URL: http://arxiv.org/abs/2104.07388v1
- Date: Thu, 15 Apr 2021 11:32:59 GMT
- Title: Conditional independence for pretext task selection in Self-supervised
speech representation learning
- Authors: Salah Zaiem, Titouan Parcollet, Slim Essid
- Abstract summary: Self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.
A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal.
This paper introduces a practical and theoretical framework to select relevant pseudo-labels with respect to a given downstream task.
- Score: 23.39079406674442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Through solving pretext tasks, self-supervised learning (SSL) leverages
unlabeled data to extract useful latent representations replacing traditional
input features in the downstream task. A common pretext task consists in
pretraining a SSL model on pseudo-labels derived from the original signal. This
technique is particularly relevant for speech data where various meaningful
signal processing features may serve as pseudo-labels. However, the process of
selecting pseudo-labels, for speech or other types of data, remains mostly
unexplored and currently relies on observing the results on the final
downstream task. Nevertheless, this methodology is not sustainable at scale due
to substantial computational (hence carbon) costs. Thus, this paper introduces
a practical and theoretical framework to select relevant pseudo-labels with
respect to a given downstream task. More precisely, we propose a functional
estimator of the pseudo-label utility grounded in the conditional independence
theory, which does not require any training. The experiments conducted on
speaker recognition and automatic speech recognition validate our estimator,
showing a significant correlation between the performance observed on the
downstream task and the utility estimates obtained with our approach,
facilitating the prospection of relevant pseudo-labels for self-supervised
speech representation learning.
Related papers
- Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling [21.82879779173242]
The lack of labeled data is a common challenge in speech classification tasks.
We propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method.
We evaluate our SSL framework on emotion recognition and dementia detection tasks.
arXiv Detail & Related papers (2024-09-25T13:51:19Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels [26.542718087103665]
SemiMatch is a semi-supervised solution for establishing dense correspondences across semantically similar images.
Our framework generates the pseudo-labels using the model's prediction itself between source and weakly-augmented target, and uses pseudo-labels to learn the model again between source and strongly-augmented target.
In experiments, SemiMatch achieves state-of-the-art performance on various benchmarks, especially on PF-Willow by a large margin.
arXiv Detail & Related papers (2022-03-30T03:52:50Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Pretext Tasks selection for multitask self-supervised speech
representation learning [23.39079406674442]
This paper introduces a method to select a group of pretext tasks among a set of candidates.
Experiments conducted on speaker recognition and automatic speech recognition validate our approach.
arXiv Detail & Related papers (2021-07-01T16:36:29Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Improving Classification through Weak Supervision in Context-specific
Conversational Agent Development for Teacher Education [1.215785021723604]
The effort required to develop an educational scenario specific conversational agent is time consuming.
Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes.
We propose using a multi-task weak supervision method combined with active learning to address these concerns.
arXiv Detail & Related papers (2020-10-23T23:39:40Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Self-Supervised Relational Reasoning for Representation Learning [5.076419064097733]
In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on unlabeled data.
We propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data.
We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones.
arXiv Detail & Related papers (2020-06-10T14:24:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.