Biased Self-supervised learning for ASR
- URL: http://arxiv.org/abs/2211.02536v1
- Date: Fri, 4 Nov 2022 15:57:59 GMT
- Title: Biased Self-supervised learning for ASR
- Authors: Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman
Mohamed, Philip C. Woodland
- Abstract summary: This paper proposes a method to bias self-supervised learning towards a specific task.
The core idea is to slightly finetune the model that is used to obtain the target sequence.
For the streaming models, the pre-training approach yields a reduction in word error rate of 44.1%.
- Score: 31.701098864180256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning via masked prediction pre-training (MPPT) has shown
impressive performance on a range of speech-processing tasks. This paper
proposes a method to bias self-supervised learning towards a specific task. The
core idea is to slightly finetune the model that is used to obtain the target
sequence. This leads to better performance and a substantial increase in
training speed. Furthermore, this paper proposes a variant of MPPT that allows
low-footprint streaming models to be trained effectively by computing the MPPT
loss on masked and unmasked frames. These approaches are evaluated for
automatic speech recognition on the Librispeech corpus, where 100 hours of data
served as the labelled data and 860 hours as the unlabelled data. The biased
training outperforms the unbiased training by 15.5% after 250k updates and
23.8% after 100k updates on test-other. For the streaming models, the
pre-training approach yields a reduction in word error rate of 44.1%.
Related papers
- Supervised Learning without Backpropagation using Spike-Timing-Dependent Plasticity for Image Recognition [3.087000217989688]
This study introduces a novel supervised learning approach for spiking neural networks that does not rely on traditional backpropagation.
Instead, it employs spike-timing-dependent plasticity (STDP) within a supervised framework for image recognition tasks.
arXiv Detail & Related papers (2024-10-21T21:32:17Z) - Grad-Instructor: Universal Backpropagation with Explainable Evaluation Neural Networks for Meta-learning and AutoML [0.0]
An Evaluation Neural Network (ENN) is trained via deep reinforcement learning to predict the performance of the target network.
The ENN then works as an additional evaluation function during backpropagation.
arXiv Detail & Related papers (2024-06-15T08:37:51Z) - Text Quality-Based Pruning for Efficient Training of Language Models [66.66259229732121]
We propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets.
By proposing the text quality metric, the paper establishes a framework to identify and eliminate low-quality text instances.
Experimental results over multiple models and datasets demonstrate the efficacy of this approach.
arXiv Detail & Related papers (2024-04-26T18:01:25Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - FedNST: Federated Noisy Student Training for Automatic Speech
Recognition [8.277567852741242]
Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems.
Key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients.
A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data.
arXiv Detail & Related papers (2022-06-06T16:18:45Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.