Rainbow Keywords: Efficient Incremental Learning for Online Spoken
Keyword Spotting
- URL: http://arxiv.org/abs/2203.16361v1
- Date: Wed, 30 Mar 2022 14:39:21 GMT
- Title: Rainbow Keywords: Efficient Incremental Learning for Online Spoken
Keyword Spotting
- Authors: Yang Xiao and Nana Hou and Eng Siong Chng
- Abstract summary: We propose a novel diversity-aware incremental learning method named Rainbow Keywords (RK)
As a result, the RK approach can incrementally learn new tasks without forgetting prior knowledge.
Experimental results show that the proposed RK approach achieves 4.2% absolute improvement in terms of average accuracy over the best baseline on Google Speech Command dataset with less required memory.
- Score: 29.65294592309984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic forgetting is a thorny challenge when updating keyword spotting
(KWS) models after deployment. This problem will be more challenging if KWS
models are further required for edge devices due to their limited memory. To
alleviate such an issue, we propose a novel diversity-aware incremental
learning method named Rainbow Keywords (RK). Specifically, the proposed RK
approach introduces a diversity-aware sampler to select a diverse set from
historical and incoming keywords by calculating classification uncertainty. As
a result, the RK approach can incrementally learn new tasks without forgetting
prior knowledge. Besides, the RK approach also proposes data augmentation and
knowledge distillation loss function for efficient memory management on the
edge device. Experimental results show that the proposed RK approach achieves
4.2% absolute improvement in terms of average accuracy over the best baseline
on Google Speech Command dataset with less required memory. The scripts are
available on GitHub.
Related papers
- Multitaper mel-spectrograms for keyword spotting [42.82842124247846]
This paper investigates the use of the multitaper technique to create improved features for KWS.
Experiment results confirm the advantages of using the proposed improved features.
arXiv Detail & Related papers (2024-07-05T17:18:25Z) - Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones.
We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z) - Pink: Unveiling the Power of Referential Comprehension for Multi-modal
LLMs [49.88461345825586]
This paper proposes a new framework to enhance the fine-grained image understanding abilities of MLLMs.
We present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.
We show that our model exhibits a 5.2% accuracy improvement over Qwen-VL and surpasses the accuracy of Kosmos-2 by 24.7%.
arXiv Detail & Related papers (2023-10-01T05:53:15Z) - Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters.
We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z) - CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model [60.30099369475092]
Supervised crowd counting relies heavily on costly manual labeling.
We propose a novel unsupervised framework for crowd counting, named CrowdCLIP.
CrowdCLIP achieves superior performance compared to previous unsupervised state-of-the-art counting methods.
arXiv Detail & Related papers (2023-04-09T12:56:54Z) - Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration.
CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root.
We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - CODA-Prompt: COntinual Decomposed Attention-based Prompting for
Rehearsal-Free Continual Learning [30.676509834338884]
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data.
We propose prompting approaches as an alternative to data-rehearsal.
We show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy.
arXiv Detail & Related papers (2022-11-23T18:57:11Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.