Behavior of Keyword Spotting Networks Under Noisy Conditions
- URL: http://arxiv.org/abs/2109.07930v1
- Date: Wed, 15 Sep 2021 10:02:34 GMT
- Title: Behavior of Keyword Spotting Networks Under Noisy Conditions
- Authors: Anwesh Mohanty, Adrian Frischknecht, Christoph Gerum and Oliver
Bringmann
- Abstract summary: Keywords spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices.
Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise.
We present an extensive comparison between state-of-the-art KWS networks under various noisy conditions.
- Score: 1.5425424751424208
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Keyword spotting (KWS) is becoming a ubiquitous need with the advancement in
artificial intelligence and smart devices. Recent work in this field have
focused on several different architectures to achieve good results on datasets
with low to moderate noise. However, the performance of these models
deteriorates under high noise conditions as shown by our experiments. In our
paper, we present an extensive comparison between state-of-the-art KWS networks
under various noisy conditions. We also suggest adaptive batch normalization as
a technique to improve the performance of the networks when the noise files are
unknown during the training phase. The results of such high noise
characterization enable future work in developing models that perform better in
the aforementioned conditions.
Related papers
- Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Machine learning of network inference enhancement from noisy measurements [13.0533106097336]
Inferring networks from observed time series data presents a clear glimpse into the interconnections among nodes.
Network inference models, when dealing with real-world open cases, experience a sharp decline in performance.
We present an elegant and efficient model-agnostic framework tailored to amplify the capabilities of model-based and model-free network inference models.
arXiv Detail & Related papers (2023-09-05T08:51:40Z) - Noise-aware Speech Enhancement using Diffusion Probabilistic Model [35.17225451626734]
We propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model.
NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models.
arXiv Detail & Related papers (2023-07-16T12:46:11Z) - SparseVSR: Lightweight and Noise Robust Visual Speech Recognition [100.43280310123784]
We generate a lightweight model that achieves higher performance than its dense model equivalent.
Our results confirm that sparse networks are more resistant to noise than dense networks.
arXiv Detail & Related papers (2023-07-10T13:34:13Z) - Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance.
We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD)
RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Variational Autoencoder for Speech Enhancement with a Noise-Aware
Encoder [30.318947721658862]
We propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs.
We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters.
arXiv Detail & Related papers (2021-02-17T11:40:42Z) - Dynamic Layer Customization for Noise Robust Speech Emotion Recognition
in Heterogeneous Condition Training [16.807298318504156]
We show that we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition.
We extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering.
arXiv Detail & Related papers (2020-10-21T18:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.