Behavior of Keyword Spotting Networks Under Noisy Conditions
- URL: http://arxiv.org/abs/2109.07930v1
- Date: Wed, 15 Sep 2021 10:02:34 GMT
- Title: Behavior of Keyword Spotting Networks Under Noisy Conditions
- Authors: Anwesh Mohanty, Adrian Frischknecht, Christoph Gerum and Oliver
Bringmann
- Abstract summary: Keywords spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices.
Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise.
We present an extensive comparison between state-of-the-art KWS networks under various noisy conditions.
- Score: 1.5425424751424208
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Keyword spotting (KWS) is becoming a ubiquitous need with the advancement in
artificial intelligence and smart devices. Recent work in this field have
focused on several different architectures to achieve good results on datasets
with low to moderate noise. However, the performance of these models
deteriorates under high noise conditions as shown by our experiments. In our
paper, we present an extensive comparison between state-of-the-art KWS networks
under various noisy conditions. We also suggest adaptive batch normalization as
a technique to improve the performance of the networks when the noise files are
unknown during the training phase. The results of such high noise
characterization enable future work in developing models that perform better in
the aforementioned conditions.
Related papers
- Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining [21.26555178371168]
Target-Speaker Voice Activity Detection (TS-VAD) is the task of detecting the presence of speech from a known target-speaker in an audio frame.
Deep neural network-based models have shown good performance in this task.
We propose a causal, Self-Supervised Learning (SSL) pretraining framework to enhance TS-VAD performance in noisy conditions.
arXiv Detail & Related papers (2025-01-06T18:00:14Z) - Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z) - Improved Noise Schedule for Diffusion Training [51.849746576387375]
We propose a novel approach to design the noise schedule for enhancing the training of diffusion models.
We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
arXiv Detail & Related papers (2024-07-03T17:34:55Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Noise-aware Speech Enhancement using Diffusion Probabilistic Model [35.17225451626734]
We propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model.
NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models.
arXiv Detail & Related papers (2023-07-16T12:46:11Z) - SparseVSR: Lightweight and Noise Robust Visual Speech Recognition [100.43280310123784]
We generate a lightweight model that achieves higher performance than its dense model equivalent.
Our results confirm that sparse networks are more resistant to noise than dense networks.
arXiv Detail & Related papers (2023-07-10T13:34:13Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Variational Autoencoder for Speech Enhancement with a Noise-Aware
Encoder [30.318947721658862]
We propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs.
We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters.
arXiv Detail & Related papers (2021-02-17T11:40:42Z) - Dynamic Layer Customization for Noise Robust Speech Emotion Recognition
in Heterogeneous Condition Training [16.807298318504156]
We show that we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition.
We extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering.
arXiv Detail & Related papers (2020-10-21T18:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.