Contrastive Semi-supervised Learning for ASR
- URL: http://arxiv.org/abs/2103.05149v1
- Date: Tue, 9 Mar 2021 00:20:37 GMT
- Title: Contrastive Semi-supervised Learning for ASR
- Authors: Alex Xiao, Christian Fuegen, Abdelrahman Mohamed
- Abstract summary: We propose Contrastive Semi-supervised Learning (CSL) for supervised learning of visual objects.
CSL eschews directly predicting teacher-generated pseudo-labels in favor of utilizing them to select positive and negative examples.
It reduces the WER by 8% compared to the standard Cross-Entropy pseudo-labeling (CE-PL) when 10hr of supervised data is used to annotate 75,000hr of videos.
- Score: 16.070972355201253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pseudo-labeling is the most adopted method for pre-training automatic speech
recognition (ASR) models. However, its performance suffers from the supervised
teacher model's degrading quality in low-resource setups and under domain
transfer. Inspired by the successes of contrastive representation learning for
computer vision and speech applications, and more recently for supervised
learning of visual objects, we propose Contrastive Semi-supervised Learning
(CSL). CSL eschews directly predicting teacher-generated pseudo-labels in favor
of utilizing them to select positive and negative examples. In the challenging
task of transcribing public social media videos, using CSL reduces the WER by
8% compared to the standard Cross-Entropy pseudo-labeling (CE-PL) when 10hr of
supervised data is used to annotate 75,000hr of videos. The WER reduction jumps
to 19% under the ultra low-resource condition of using 1hr labels for teacher
supervision. CSL generalizes much better in out-of-domain conditions, showing
up to 17% WER reduction compared to the best CE-PL pre-trained model.
Related papers
- A Self-Supervised Learning Pipeline for Demographically Fair Facial Attribute Classification [3.5092955099876266]
This paper proposes a fully self-supervised pipeline for demographically fair facial attribute classification.
We leverage completely unlabeled data pseudolabeled via pre-trained encoders, diverse data curation techniques, and meta-learning-based weighted contrastive learning.
arXiv Detail & Related papers (2024-07-14T07:11:57Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models [0.0]
Speech representations from large-scale ASR models contain valuable speaker information.
We propose a framework to learn speaker representations in an SSL context by fine-tuning a pre-trained WavLM with a supervised loss.
Our method achieves 0.99% EER on VoxCeleb1-O, establishing the new state-of-the-art on self-supervised SV.
arXiv Detail & Related papers (2024-06-04T12:58:19Z) - Reinforcement Learning-Guided Semi-Supervised Learning [20.599506122857328]
We propose a novel Reinforcement Learning Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem.
RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance.
We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.
arXiv Detail & Related papers (2024-05-02T21:52:24Z) - Evaluating Fairness in Self-supervised and Supervised Models for
Sequential Data [10.626503137418636]
Self-supervised learning (SSL) has become the de facto training paradigm of large models.
This study explores the impact of pre-training and fine-tuning strategies on fairness.
arXiv Detail & Related papers (2024-01-03T09:31:43Z) - On Higher Adversarial Susceptibility of Contrastive Self-Supervised
Learning [104.00264962878956]
Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification.
It is still largely unknown if the nature of the representation induced by the two learning paradigms is similar.
We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon.
We devise strategies that are simple, yet effective in improving model robustness with CSL training.
arXiv Detail & Related papers (2022-07-22T03:49:50Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting.
Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z) - Exploiting Large-scale Teacher-Student Training for On-device Acoustic
Models [15.237992590162593]
We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM)
We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR)
We then switch to SSL using larger student models in low data regimes; while learning efficiency with unsupervised data is higher, student models may outperform teacher models in such a setting.
arXiv Detail & Related papers (2021-06-11T02:23:40Z) - Task Aligned Generative Meta-learning for Zero-shot Learning [64.16125851588437]
We propose a Task-aligned Generative Meta-learning model for Zero-shot learning (TGMZ)
TGMZ mitigates the potentially biased training and enables meta-ZSL to accommodate real-world datasets containing diverse distributions.
Our comparisons with state-of-the-art algorithms show the improvements of 2.1%, 3.0%, 2.5%, and 7.6% achieved by TGMZ on AWA1, AWA2, CUB, and aPY datasets.
arXiv Detail & Related papers (2021-03-03T05:18:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.