Semi-supervised ASR by End-to-end Self-training
- URL: http://arxiv.org/abs/2001.09128v2
- Date: Thu, 30 Jul 2020 14:48:51 GMT
- Title: Semi-supervised ASR by End-to-end Self-training
- Authors: Yang Chen, Weiran Wang, Chao Wang
- Abstract summary: We propose a self-training method with an end-to-end system for semi-supervised ASR.
We iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update.
Our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.
- Score: 18.725686837244265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning based end-to-end automatic speech recognition (ASR)
systems have greatly simplified modeling pipelines, they suffer from the data
sparsity issue. In this work, we propose a self-training method with an
end-to-end system for semi-supervised ASR. Starting from a Connectionist
Temporal Classification (CTC) system trained on the supervised data, we
iteratively generate pseudo-labels on a mini-batch of unsupervised utterances
with the current model, and use the pseudo-labels to augment the supervised
data for immediate model update. Our method retains the simplicity of
end-to-end ASR systems, and can be seen as performing alternating optimization
over a well-defined learning objective. We also perform empirical
investigations of our method, regarding the effect of data augmentation,
decoding beamsize for pseudo-label generation, and freshness of pseudo-labels.
On a commonly used semi-supervised ASR setting with the WSJ corpus, our method
gives 14.4% relative WER improvement over a carefully-trained base system with
data augmentation, reducing the performance gap between the base system and the
oracle system by 50%.
Related papers
- Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Self-Supervised Representation Learning from Temporal Ordering of
Automated Driving Sequences [49.91741677556553]
We propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks.
We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems.
Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods.
arXiv Detail & Related papers (2023-02-17T18:18:27Z) - Prompt-driven efficient Open-set Semi-supervised Learning [52.30303262499391]
Open-set semi-supervised learning (OSSL) has attracted growing interest, which investigates a more practical scenario where out-of-distribution (OOD) samples are only contained in unlabeled data.
We propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.
arXiv Detail & Related papers (2022-09-28T16:25:08Z) - Self-Contrastive Learning based Semi-Supervised Radio Modulation
Classification [6.089994098441994]
This paper presents a semi-supervised learning framework for automatic modulation classification (AMC)
By carefully utilizing unlabeled signal data with a self-supervised contrastive-learning pre-training step, our framework achieves higher performance given smaller amounts of labeled data.
We evaluate the performance of our semi-supervised framework on a public dataset.
arXiv Detail & Related papers (2022-03-29T22:21:14Z) - Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting.
Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z) - A Deep-Learning Intelligent System Incorporating Data Augmentation for
Short-Term Voltage Stability Assessment of Power Systems [9.299576471941753]
This paper proposes a novel deep-learning intelligent system incorporating data augmentation for STVSA of power systems.
Due to the unavailability of reliable quantitative criteria to judge the stability status for a specific power system, semi-supervised cluster learning is leveraged to obtain labeled samples.
conditional least squares generative adversarial networks (LSGAN)-based data augmentation is introduced to expand the original dataset.
arXiv Detail & Related papers (2021-12-05T11:40:54Z) - Semi-Supervised Object Detection with Adaptive Class-Rebalancing
Self-Training [5.874575666947381]
This study delves into semi-supervised object detection to improve detector performance with additional unlabeled data.
We propose a novel two-stage filtering algorithm to generate accurate pseudo-labels.
Our method achieves satisfactory improvements on MS-COCO and VOC benchmarks.
arXiv Detail & Related papers (2021-07-11T12:14:42Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Incremental Learning for End-to-End Automatic Speech Recognition [41.297106772785206]
We propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR)
We design a novel explainability-based knowledge distillation for ASR models, which is combined with a response-based knowledge distillation to maintain the original model's predictions and the "reason" for the predictions.
Results on a multi-stage sequential training task show that our method outperforms existing ones in mitigating forgetting.
arXiv Detail & Related papers (2020-05-11T08:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.