Related papers: Adaptive Noise Resilient Keyword Spotting Using One-Shot Learning

Adaptive Noise Resilient Keyword Spotting Using One-Shot Learning

URL: http://arxiv.org/abs/2505.09304v1
Date: Wed, 14 May 2025 11:39:47 GMT
Title: Adaptive Noise Resilient Keyword Spotting Using One-Shot Learning
Authors: Luciano Sebastian Martinez-Rau, Quynh Nguyen Phuong Vu, Yuxuan Zhang, Bengt Oelmann, Sebastian Bader,
Abstract summary: Keywords spotting (KWS) is a key component of smart devices, enabling efficient and intuitive audio interaction.<n> KWS systems often suffer performance degradation under real-world operating conditions.<n>This study proposes a low computational approach for continuous noise adaptation of pretrained neural networks used for KWS classification.
Score: 5.967661928760498
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Keyword spotting (KWS) is a key component of smart devices, enabling efficient and intuitive audio interaction. However, standard KWS systems deployed on embedded devices often suffer performance degradation under real-world operating conditions. Resilient KWS systems address this issue by enabling dynamic adaptation, with applications such as adding or replacing keywords, adjusting to specific users, and improving noise robustness. However, deploying resilient, standalone KWS systems with low latency on resource-constrained devices remains challenging due to limited memory and computational resources. This study proposes a low computational approach for continuous noise adaptation of pretrained neural networks used for KWS classification, requiring only 1-shot learning and one epoch. The proposed method was assessed using two pretrained models and three real-world noise sources at signal-to-noise ratios (SNRs) ranging from 24 to -3 dB. The adapted models consistently outperformed the pretrained models across all scenarios, especially at SNR $\leq$ 18 dB, achieving accuracy improvements of 4.9% to 46.0%. These results highlight the efficacy of the proposed methodology while being lightweight enough for deployment on resource-constrained devices.

Related papers

Reliable Few-shot Learning under Dual Noises [166.53173694689693]
We propose DEnoised Task Adaptation (DETA++) for reliable few-shot learning.<n>DETA++ employs a memory bank to store and refine clean regions for each inner-task class, based on which a Local Nearestid (LocalNCC) is devised to yield noise-robust predictions on query samples.<n>Extensive experiments demonstrate the effectiveness and flexibility of DETA++.
arXiv Detail & Related papers (2025-06-19T14:05:57Z)
Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments [5.8166742412657895]
Deep neural networks (DNNs) have achieved remarkable success across diverse domains, but their performance can be severely degraded by noisy or corrupted training data.<n>We propose a novel framework that integrates attribution-guided data partitioning, discriminative neuron pruning, and targeted fine-tuning to mitigate the impact of noisy samples.<n>Our framework achieves approximately a 10% absolute accuracy improvement over standard retraining on CIFAR-10 with injected label noise.
arXiv Detail & Related papers (2025-06-13T09:37:11Z)
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models [11.696474872520808]
Speech Foundation Models encounter significant performance degradation when deployed in real-world scenarios involving acoustic domain shifts.<n>Test-time adaptation (TTA) has emerged as a viable strategy to address such domain shifts at inference time without requiring access to source data or labels.<n>E-BATS is the first Efficient BAckpropagation-free TTA framework designed explicitly for speech foundation models.
arXiv Detail & Related papers (2025-06-08T10:33:37Z)
Meta-Learning-Based Delayless Subband Adaptive Filter using Complex Self-Attention for Active Noise Control [11.118668841431562]
We reformulate the active noise control problem as a meta-learning problem.<n>We propose a meta-learning-based delayless subband adaptive filter with deep neural networks.<n>Our model achieves superior noise reduction performance compared to traditional methods.
arXiv Detail & Related papers (2024-12-27T05:51:40Z)
D4AM: A General Denoising Framework for Downstream Acoustic Models [45.04967351760919]
Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. Existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. We propose a general denoising framework, D4AM, for various downstream acoustic models.
arXiv Detail & Related papers (2023-11-28T08:27:27Z)
Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity) Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z)
Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities. Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z)
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN [1.495380389108477]
We propose a new framework to improve automatic speech recognition systems in resource-scarce environments. We use a generative adversarial network (GAN) operating on acoustic input features to enhance the features of mismatched data. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER)
arXiv Detail & Related papers (2022-10-03T05:33:28Z)
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments. We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness. This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.