Improving Label-Deficient Keyword Spotting Through Self-Supervised
Pretraining
- URL: http://arxiv.org/abs/2210.01703v3
- Date: Wed, 24 May 2023 12:17:31 GMT
- Title: Improving Label-Deficient Keyword Spotting Through Self-Supervised
Pretraining
- Authors: Holger Severin Bovbjerg, Zheng-Hua Tan
- Abstract summary: Keywords Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants.
KWS models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available.
Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data.
- Score: 18.19207291891767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyword Spotting (KWS) models are becoming increasingly integrated into
various systems, e.g. voice assistants. To achieve satisfactory performance,
these models typically rely on a large amount of labelled data, limiting their
applications only to situations where such data is available. Self-supervised
Learning (SSL) methods can mitigate such a reliance by leveraging
readily-available unlabelled data. Most SSL methods for speech have primarily
been studied for large models, whereas this is not ideal, as compact KWS models
are generally required. This paper explores the effectiveness of SSL on small
models for KWS and establishes that SSL can enhance the performance of small
KWS models when labelled data is scarce. We pretrain three compact
transformer-based KWS models using Data2Vec, and fine-tune them on a
label-deficient setup of the Google Speech Commands data set. It is found that
Data2Vec pretraining leads to a significant increase in accuracy, with
label-deficient scenarios showing an improvement of 8.22% 11.18% absolute
accuracy.
Related papers
- Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting [18.456711824241978]
We propose datasource-aware disentangled learning with adversarial examples to improve KWS robustness.
Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate.
Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.
arXiv Detail & Related papers (2024-08-23T20:03:51Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Noise-Robust Keyword Spotting through Self-supervised Pretraining [11.90089857382705]
Self-supervised learning has been shown to increase the accuracy in clean conditions.
This paper explores how SSL pretraining can be used to enhance the robustness of KWS models in noisy conditions.
arXiv Detail & Related papers (2024-03-27T13:42:14Z) - Improving Small Footprint Few-shot Keyword Spotting with Supervision on
Auxiliary Data [19.075820340282934]
We propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source.
We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data.
arXiv Detail & Related papers (2023-08-31T07:29:42Z) - Exploring Representation Learning for Small-Footprint Keyword Spotting [11.586285744728068]
Main challenges of KWS are limited labeled data and limited available device resources.
To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.
Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy.
arXiv Detail & Related papers (2023-03-20T07:09:26Z) - Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks.
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.