Iterative Pseudo-Labeling for Speech Recognition
- URL: http://arxiv.org/abs/2005.09267v2
- Date: Thu, 27 Aug 2020 01:30:10 GMT
- Title: Iterative Pseudo-Labeling for Speech Recognition
- Authors: Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel
Synnaeve, Ronan Collobert
- Abstract summary: Pseudo-labeling has recently shown promise in end-to-end automatic speech recognition (ASR)
We study Iterative Pseudo-Labeling (IPL), a semi-supervised algorithm which efficiently performs multiple iterations of pseudo-labeling on unlabeled data.
- Score: 35.48685001317295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pseudo-labeling has recently shown promise in end-to-end automatic speech
recognition (ASR). We study Iterative Pseudo-Labeling (IPL), a semi-supervised
algorithm which efficiently performs multiple iterations of pseudo-labeling on
unlabeled data as the acoustic model evolves. In particular, IPL fine-tunes an
existing model at each iteration using both labeled data and a subset of
unlabeled data. We study the main components of IPL: decoding with a language
model and data augmentation. We then demonstrate the effectiveness of IPL by
achieving state-of-the-art word-error rate on the Librispeech test sets in both
standard and low-resource setting. We also study the effect of language models
trained on different corpora to show IPL can effectively utilize additional
text. Finally, we release a new large in-domain text corpus which does not
overlap with the Librispeech training transcriptions to foster research in
low-resource, semi-supervised ASR
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Representation Learning With Hidden Unit Clustering For Low Resource
Speech Applications [37.89857769906568]
We describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework.
The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers.
The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations.
arXiv Detail & Related papers (2023-07-14T13:02:10Z) - Towards hate speech detection in low-resource languages: Comparing ASR
to acoustic word embeddings on Wolof and Swahili [16.424308444697015]
We consider hate speech detection through keyword spotting on radio broadcasts.
One approach is to build an automatic speech recognition system for the target low-resource language.
We compare this to using acoustic word embedding models that map speech segments to a space where matching words have similar vectors.
arXiv Detail & Related papers (2023-06-01T07:25:10Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - A Complementary Joint Training Approach Using Unpaired Speech and Text
for Low-Resource Automatic Speech Recognition [25.473191378558138]
We leverage unpaired data to train a general sequence-to-sequence model.
Inspired by the complementarity of speech-PseudoLabel pair and SynthesizedAudio-text pair, we propose a complementary joint training(CJT) method.
arXiv Detail & Related papers (2022-04-05T07:02:53Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [10.261890123213622]
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR)
Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate training pairs.
arXiv Detail & Related papers (2021-04-03T13:00:00Z) - SlimIPL: Language-Model-Free Iterative Pseudo-Labeling [32.39921686482643]
Iterative Pseudo-Labeling (IPL) continuously trains a single model using pseudo-labels iteratively re-generated as the model learns.
We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models.
arXiv Detail & Related papers (2020-10-22T08:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.