Few-Shot Open-Set Learning for On-Device Customization of KeyWord
Spotting Systems
- URL: http://arxiv.org/abs/2306.02161v1
- Date: Sat, 3 Jun 2023 17:10:33 GMT
- Title: Few-Shot Open-Set Learning for On-Device Customization of KeyWord
Spotting Systems
- Authors: Manuele Rusci and Tinne Tuytelaars
- Abstract summary: This paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier.
With user-defined keywords from 10 classes of the Google Speech Command dataset, our study reports an accuracy of up to 76% in a 10-shot scenario.
- Score: 41.24728444810133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A personalized KeyWord Spotting (KWS) pipeline typically requires the
training of a Deep Learning model on a large set of user-defined speech
utterances, preventing fast customization directly applied on-device. To fill
this gap, this paper investigates few-shot learning methods for open-set KWS
classification by combining a deep feature encoder with a prototype-based
classifier. With user-defined keywords from 10 classes of the Google Speech
Command dataset, our study reports an accuracy of up to 76% in a 10-shot
scenario while the false acceptance rate of unknown data is kept to 5%. In the
analyzed settings, the usage of the triplet loss to train an encoder with
normalized output features performs better than the prototypical networks
jointly trained with a generator of dummy unknown-class prototypes. This design
is also more effective than encoders trained on a classification problem and
features fewer parameters than other iso-accuracy approaches.
Related papers
- A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment [0.14999444543328289]
This research study unveils the contribution of the last encoder layer in the identification of disfluencies in stuttered speech.
It has led to a computationally efficient approach, 83.7% less parameters to train, making the proposed approach more adaptable for various dialects and languages.
arXiv Detail & Related papers (2024-06-09T13:42:51Z) - Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification [6.975902383951604]
Current methodologies face difficulties with the unpredictable distribution of outliers.
We present the Dual for Threshold-Based Re-Classification (DETER) to address these challenges.
Our model outperforms previous benchmarks, increasing up to 13% and 5% in F1 score for known and unknown intents.
arXiv Detail & Related papers (2024-05-30T11:46:42Z) - Free-text Keystroke Authentication using Transformers: A Comparative
Study of Architectures and Loss Functions [1.0152838128195467]
Keystroke biometrics is a promising approach for user identification and verification, leveraging the unique patterns in individuals' typing behavior.
We propose a Transformer-based network that employs self-attention to extract informative features from keystroke sequences.
Our model surpasses the previous state-of-the-art in free-text keystroke authentication.
arXiv Detail & Related papers (2023-10-18T00:34:26Z) - Few-Shot Specific Emitter Identification via Deep Metric Ensemble
Learning [26.581059299453663]
We propose a novel FS-SEI for aircraft identification via automatic dependent surveillance-broadcast (ADS-B) signals.
Specifically, the proposed method consists of feature embedding and classification.
Simulation results show that if the number of samples per category is more than 5, the average accuracy of our proposed method is higher than 98%.
arXiv Detail & Related papers (2022-07-14T01:09:22Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Discriminative Nearest Neighbor Few-Shot Intent Detection by
Transferring Natural Language Inference [150.07326223077405]
Few-shot learning is attracting much attention to mitigate data scarcity.
We present a discriminative nearest neighbor classification with deep self-attention.
We propose to boost the discriminative ability by transferring a natural language inference (NLI) model.
arXiv Detail & Related papers (2020-10-25T00:39:32Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM
Networks [3.8382752162527933]
In this paper, we focus on an open-vocabulary keyword spotting method, allowing the user to define their own keywords without having to retrain the whole model.
We describe the different design choices leading to a fast and small-footprint system, able to run on tiny devices, for any arbitrary set of user-defined keywords.
arXiv Detail & Related papers (2020-02-25T13:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.