Related papers: Behavior of Keyword Spotting Networks Under Noisy Conditions

Behavior of Keyword Spotting Networks Under Noisy Conditions

URL: http://arxiv.org/abs/2109.07930v1
Date: Wed, 15 Sep 2021 10:02:34 GMT
Title: Behavior of Keyword Spotting Networks Under Noisy Conditions
Authors: Anwesh Mohanty, Adrian Frischknecht, Christoph Gerum and Oliver Bringmann
Abstract summary: Keywords spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices. Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise. We present an extensive comparison between state-of-the-art KWS networks under various noisy conditions.
Score: 1.5425424751424208
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Keyword spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices. Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise. However, the performance of these models deteriorates under high noise conditions as shown by our experiments. In our paper, we present an extensive comparison between state-of-the-art KWS networks under various noisy conditions. We also suggest adaptive batch normalization as a technique to improve the performance of the networks when the noise files are unknown during the training phase. The results of such high noise characterization enable future work in developing models that perform better in the aforementioned conditions.

Related papers

Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations. Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness. NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z)
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining [21.26555178371168]
Target-Speaker Voice Activity Detection (TS-VAD) is the task of detecting the presence of speech from a known target-speaker in an audio frame. Deep neural network-based models have shown good performance in this task. We propose a causal, Self-Supervised Learning (SSL) pretraining framework to enhance TS-VAD performance in noisy conditions.
arXiv Detail & Related papers (2025-01-06T18:00:14Z)
Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z)
Improved Noise Schedule for Diffusion Training [51.849746576387375]
We propose a novel approach to design the noise schedule for enhancing the training of diffusion models. We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
arXiv Detail & Related papers (2024-07-03T17:34:55Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
Machine learning of network inference enhancement from noisy measurements [13.0533106097336]
Inferring networks from observed time series data presents a clear glimpse into the interconnections among nodes. Network inference models, when dealing with real-world open cases, experience a sharp decline in performance. We present an elegant and efficient model-agnostic framework tailored to amplify the capabilities of model-based and model-free network inference models.
arXiv Detail & Related papers (2023-09-05T08:51:40Z)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model [35.17225451626734]
We propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model. NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models.
arXiv Detail & Related papers (2023-07-16T12:46:11Z)
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition [100.43280310123784]
We generate a lightweight model that achieves higher performance than its dense model equivalent. Our results confirm that sparse networks are more resistant to noise than dense networks.
arXiv Detail & Related papers (2023-07-10T13:34:13Z)
Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance. We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD) RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder [30.318947721658862]
We propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs. We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters.
arXiv Detail & Related papers (2021-02-17T11:40:42Z)
Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training [16.807298318504156]
We show that we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition. We extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering.
arXiv Detail & Related papers (2020-10-21T18:07:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.