Noisy student-teacher training for robust keyword spotting
- URL: http://arxiv.org/abs/2106.01604v1
- Date: Thu, 3 Jun 2021 05:36:18 GMT
- Title: Noisy student-teacher training for robust keyword spotting
- Authors: Hyun-Jin Park, Pai Zhu, Ignacio Lopez Moreno, Niranjan Subrahmanya
- Abstract summary: We propose self-training with noisy student-teacher approach for streaming keyword spotting.
The proposed method applies aggressive data augmentation on the input of both student and teacher.
Experiments show that the proposed self-training with noisy student-teacher training improves accuracy of some difficult-conditioned test sets by as much as 60%.
- Score: 13.264760485020757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose self-training with noisy student-teacher approach for streaming
keyword spotting, that can utilize large-scale unlabeled data and aggressive
data augmentation. The proposed method applies aggressive data augmentation
(spectral augmentation) on the input of both student and teacher and utilize
unlabeled data at scale, which significantly boosts the accuracy of student
against challenging conditions. Such aggressive augmentation usually degrades
model performance when used with supervised training with hard-labeled data.
Experiments show that aggressive spec augmentation on baseline supervised
training method degrades accuracy, while the proposed self-training with noisy
student-teacher training improves accuracy of some difficult-conditioned test
sets by as much as 60%.
Related papers
- Noisy Self-Training with Data Augmentations for Offensive and Hate
Speech Detection Tasks [3.703767478524629]
"Noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against adversarial attacks.
We evaluate our experiments on two offensive/hate-speech datasets and demonstrate that (i) self-training consistently improves performance regardless of model size, resulting in up to +1.5% F1-macro on both datasets, and (ii) noisy self-training with textual data augmentations, despite being successfully applied in similar settings, decreases performance on offensive and hate-speech domains when compared to the default method, even with state-of-the-art augmentations such as backtranslation.
arXiv Detail & Related papers (2023-07-31T12:35:54Z) - Mitigating Label Noise through Data Ambiguation [9.51828574518325]
Large models with high expressive power are prone to memorizing incorrect labels, thereby harming generalization performance.
In this paper, we suggest to address the shortcomings of both methodologies by "ambiguating" the target information.
More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold.
arXiv Detail & Related papers (2023-05-23T07:29:08Z) - Enhancing Self-Training Methods [0.0]
Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data.
Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias"
arXiv Detail & Related papers (2023-01-18T03:56:17Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Dynamic Supervisor for Cross-dataset Object Detection [52.95818230087297]
Cross-dataset training in object detection tasks is complicated because the inconsistency in the category range across datasets transforms fully supervised learning into semi-supervised learning.
We propose a dynamic supervisor framework that updates the annotations multiple times through multiple-updated submodels trained using hard and soft labels.
In the final generated annotations, both recall and precision improve significantly through the integration of hard-label training with soft-label training.
arXiv Detail & Related papers (2022-04-01T03:18:46Z) - Investigating a Baseline Of Self Supervised Learning Towards Reducing
Labeling Costs For Image Classification [0.0]
The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task.
Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task.
arXiv Detail & Related papers (2021-08-17T06:43:05Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.