Combining Unsupervised and Text Augmented Semi-Supervised Learning for
Low Resourced Autoregressive Speech Recognition
- URL: http://arxiv.org/abs/2110.15836v1
- Date: Fri, 29 Oct 2021 14:59:18 GMT
- Title: Combining Unsupervised and Text Augmented Semi-Supervised Learning for
Low Resourced Autoregressive Speech Recognition
- Authors: Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover
- Abstract summary: We pretrain state-of-the-art Conformer models in an unsupervised manner.
Additional text data is incorporated through external language models.
Final performance is an additional 2% better absolute when using CTC-based decoding for semi-supervised training.
- Score: 7.067186994804316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in unsupervised representation learning have demonstrated the
impact of pretraining on large amounts of read speech. We adapt these
techniques for domain adaptation in low-resource -- both in terms of data and
compute -- conversational and broadcast domains. Moving beyond CTC, we pretrain
state-of-the-art Conformer models in an unsupervised manner. While the
unsupervised approach outperforms traditional semi-supervised training, the
techniques are complementary. Combining the techniques is a 5% absolute
improvement in WER, averaged over all conditions, compared to semi-supervised
training alone. Additional text data is incorporated through external language
models. By using CTC-based decoding, we are better able to take advantage of
the additional text data. When used as a transcription model, it allows the
Conformer model to better incorporate the knowledge from the language model
through semi-supervised training than shallow fusion. Final performance is an
additional 2% better absolute when using CTC-based decoding for semi-supervised
training compared to shallow fusion.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Pre-training for Speech Translation: CTC Meets Optimal Transport [29.807861658249923]
We show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design.
We propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.
Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space.
arXiv Detail & Related papers (2023-01-27T14:03:09Z) - Efficient Speech Translation with Pre-trained Models [13.107314023500349]
We investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models.
While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data.
arXiv Detail & Related papers (2022-11-09T15:07:06Z) - Improving Deliberation by Text-Only and Semi-Supervised Training [42.942428288428836]
We propose incorporating text-only and semi-supervised training into an attention-based deliberation model.
We achieve 4%-12% WER reduction for various tasks compared to the baseline deliberation.
We show that the deliberation model also achieves a positive human side-by-side evaluation.
arXiv Detail & Related papers (2022-06-29T15:30:44Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Progressive Class Semantic Matching for Semi-supervised Text
Classification [26.794533973357403]
We investigate the marriage between semi-supervised learning and a pre-trained language model.
By means of extensive experiments, we show that our method can bring remarkable improvement to baselines.
arXiv Detail & Related papers (2022-05-20T13:59:03Z) - Neural Semi-supervised Learning for Text Classification Under
Large-Scale Pretraining [51.19885385587916]
We conduct studies on semi-supervised learning in the task of text classification under the context of large-scale LM pretraining.
Our work marks an initial step in understanding the behavior of semi-supervised learning models under the context of large-scale pretraining.
arXiv Detail & Related papers (2020-11-17T13:39:05Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.