Noise-Robust Keyword Spotting through Self-supervised Pretraining
- URL: http://arxiv.org/abs/2403.18560v1
- Date: Wed, 27 Mar 2024 13:42:14 GMT
- Title: Noise-Robust Keyword Spotting through Self-supervised Pretraining
- Authors: Jacob Mørk, Holger Severin Bovbjerg, Gergely Kiss, Zheng-Hua Tan,
- Abstract summary: Self-supervised learning has been shown to increase the accuracy in clean conditions.
This paper explores how SSL pretraining can be used to enhance the robustness of KWS models in noisy conditions.
- Score: 11.90089857382705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored. Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions, and superior to supervised MTR for testing conditions of SNR above 5 dB. This indicates that pretraining alone can increase the model's robustness. Finally, it is found that using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Conditional Online Learning for Keyword Spotting [0.0]
This work investigates a simple but effective online continual learning method that updates a keyword spotter on-device via SGD as new data becomes available.
Experiments demonstrate that, compared to a naive online learning implementation, conditional model updates based on its performance in a small hold-out set drawn from the training distribution mitigate catastrophic forgetting.
arXiv Detail & Related papers (2023-05-19T15:46:31Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Exploring Representation Learning for Small-Footprint Keyword Spotting [11.586285744728068]
Main challenges of KWS are limited labeled data and limited available device resources.
To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.
Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy.
arXiv Detail & Related papers (2023-03-20T07:09:26Z) - Improving Label-Deficient Keyword Spotting Through Self-Supervised
Pretraining [18.19207291891767]
Keywords Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants.
KWS models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available.
Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data.
arXiv Detail & Related papers (2022-10-04T15:56:27Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.