Self-training and Pre-training are Complementary for Speech Recognition
- URL: http://arxiv.org/abs/2010.11430v1
- Date: Thu, 22 Oct 2020 04:15:37 GMT
- Title: Self-training and Pre-training are Complementary for Speech Recognition
- Authors: Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello,
Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli
- Abstract summary: Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.
We show that pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups.
- Score: 64.85342993297677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-training and unsupervised pre-training have emerged as effective
approaches to improve speech recognition systems using unlabeled data. However,
it is not clear whether they learn similar patterns or if they can be
effectively combined. In this paper, we show that pseudo-labeling and
pre-training with wav2vec 2.0 are complementary in a variety of labeled data
setups. Using just 10 minutes of labeled data from Libri-light as well as 53k
hours of unlabeled data from LibriVox achieves WERs of 3.0%/5.2% on the clean
and other test sets of Librispeech - rivaling the best published systems
trained on 960 hours of labeled data only a year ago. Training on all labeled
data of Librispeech achieves WERs of 1.5%/3.1%.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Neighborhood-Regularized Self-Training for Learning with Few Labels [21.7848889781112]
One drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels.
We develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels.
Our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline.
arXiv Detail & Related papers (2023-01-10T00:07:33Z) - LST: Lexicon-Guided Self-Training for Few-Shot Text Classification [3.7277082975620806]
We introduce LST, a simple self-training method that uses a lexicon to guide the pseudo-labeling mechanism.
We demonstrate that this simple yet well-crafted lexical knowledge achieves 1.0-2.0% better performance on 30 labeled samples per class for five benchmark datasets.
arXiv Detail & Related papers (2022-02-05T14:33:12Z) - Unsupervised Speech Recognition [55.864459085947345]
wav2vec-U, short for wav2vec Unsupervised, is a method to train speech recognition models without any labeled data.
We leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training.
On the larger English Librispeech benchmark, wav2vec-U achieves a word error rate of 5.9 on test-other, rivaling some of the best published systems trained on 960 hours of labeled data from only two years ago.
arXiv Detail & Related papers (2021-05-24T04:10:47Z) - Pushing the Limits of Semi-Supervised Learning for Automatic Speech
Recognition [97.44056170380726]
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech.
We carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training.
We are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.
arXiv Detail & Related papers (2020-10-20T17:58:13Z) - wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
Representations [51.25118580050847]
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods.
wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned.
arXiv Detail & Related papers (2020-06-20T02:35:02Z) - Improved Noisy Student Training for Automatic Speech Recognition [89.8397907990268]
"Noisy student training" is an iterative self-training method that leverages augmentation to improve network performance.
We find effective methods to filter, balance and augment the data generated in between self-training iterations.
We are able to improve upon the previous state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h (4.74%/12.20%) and LibriSpeech (1.9%/4.1%)
arXiv Detail & Related papers (2020-05-19T17:57:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.