Improving Small Footprint Few-shot Keyword Spotting with Supervision on
Auxiliary Data
- URL: http://arxiv.org/abs/2309.00647v1
- Date: Thu, 31 Aug 2023 07:29:42 GMT
- Title: Improving Small Footprint Few-shot Keyword Spotting with Supervision on
Auxiliary Data
- Authors: Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang
- Abstract summary: We propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source.
We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data.
- Score: 19.075820340282934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot keyword spotting (FS-KWS) models usually require large-scale
annotated datasets to generalize to unseen target keywords. However, existing
KWS datasets are limited in scale and gathering keyword-like labeled data is
costly undertaking. To mitigate this issue, we propose a framework that uses
easily collectible, unlabeled reading speech data as an auxiliary source.
Self-supervised learning has been widely adopted for learning representations
from unlabeled data; however, it is known to be suitable for large models with
enough capacity and is not practical for training a small footprint FS-KWS
model. Instead, we automatically annotate and filter the data to construct a
keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We
then adopt multi-task learning that helps the model to enhance the
representation power from out-of-domain auxiliary data. Our method notably
improves the performance over competitive methods in the FS-KWS benchmark.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Reinforcement Learning with Generative Models for Compact Support Sets [10.041289551532804]
We propose a framework utilizing reinforcement learning as a control for foundation models.
Our framework produced excellent results, increasing classification accuracy by significant margins for no additional labelling or data cost.
arXiv Detail & Related papers (2024-04-25T02:48:16Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - Improving Label-Deficient Keyword Spotting Through Self-Supervised
Pretraining [18.19207291891767]
Keywords Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants.
KWS models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available.
Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data.
arXiv Detail & Related papers (2022-10-04T15:56:27Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - On the Use of External Data for Spoken Named Entity Recognition [40.93448412171246]
Recent advances in self-supervised speech representations have made it feasible to consider learning models with limited labeled data.
We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline approaches.
arXiv Detail & Related papers (2021-12-14T18:49:26Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Self-Supervision based Task-Specific Image Collection Summarization [3.115375810642661]
We propose a novel approach to task-specific image corpus summarization using semantic information and self-supervision.
Our method uses a classification-based Wasserstein generative adversarial network (WGAN) as a feature generating network.
The model then generates a summary at inference time by using K-means clustering in the semantic embedding space.
arXiv Detail & Related papers (2020-12-19T10:58:04Z) - Social Adaptive Module for Weakly-supervised Group Activity Recognition [143.68241396839062]
This paper presents a new task named weakly-supervised group activity recognition (GAR)
It differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.
This eases us to collect and annotate a large-scale NBA dataset and thus raise new challenges to GAR.
arXiv Detail & Related papers (2020-07-18T16:40:55Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.