Related papers: Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

URL: http://arxiv.org/abs/2210.01703v3
Date: Wed, 24 May 2023 12:17:31 GMT
Title: Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
Authors: Holger Severin Bovbjerg, Zheng-Hua Tan
Abstract summary: Keywords Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants. KWS models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available. Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data.
Score: 18.19207291891767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Keyword Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants. To achieve satisfactory performance, these models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available. Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data. Most SSL methods for speech have primarily been studied for large models, whereas this is not ideal, as compact KWS models are generally required. This paper explores the effectiveness of SSL on small models for KWS and establishes that SSL can enhance the performance of small KWS models when labelled data is scarce. We pretrain three compact transformer-based KWS models using Data2Vec, and fine-tune them on a label-deficient setup of the Google Speech Commands data set. It is found that Data2Vec pretraining leads to a significant increase in accuracy, with label-deficient scenarios showing an improvement of 8.22% 11.18% absolute accuracy.

Related papers

Unlabeled Data or Pre-trained Model: Rethinking Semi-Supervised Learning and Pretrain-Finetuning [47.18766077898836]
Semi-supervised learning (SSL) alleviates the cost of data labeling process by exploiting unlabeled data.<n>Pretrain-Finetuning paradigm has garnered significant attention in recent years.<n>We propose textitFew-shot SSL -- a framework that enables fair comparison between these two paradigms.
arXiv Detail & Related papers (2025-05-19T16:29:20Z)
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data [36.21759320898034]
Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. We propose Firstly Adapt, Then catEgorize (FATE), a novel SSL framework tailored for scenarios with extremely limited labeled data. FATE exploits unlabeled data to compensate for scarce supervision signals, then transfers to downstream tasks.
arXiv Detail & Related papers (2025-04-14T02:54:28Z)
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR) In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z)
Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting [18.456711824241978]
We propose datasource-aware disentangled learning with adversarial examples to improve KWS robustness. Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate. Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.
arXiv Detail & Related papers (2024-08-23T20:03:51Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
Noise-Robust Keyword Spotting through Self-supervised Pretraining [11.90089857382705]
Self-supervised learning has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining can be used to enhance the robustness of KWS models in noisy conditions.
arXiv Detail & Related papers (2024-03-27T13:42:14Z)
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data [19.075820340282934]
We propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data.
arXiv Detail & Related papers (2023-08-31T07:29:42Z)
Exploring Representation Learning for Small-Footprint Keyword Spotting [11.586285744728068]
Main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy.
arXiv Detail & Related papers (2023-03-20T07:09:26Z)
Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks. In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained. We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z)
Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones. We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL) Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z)
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure. In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning. It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model. It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.