Variational Autoencoder for Speech Enhancement with a Noise-Aware
Encoder
- URL: http://arxiv.org/abs/2102.08706v1
- Date: Wed, 17 Feb 2021 11:40:42 GMT
- Title: Variational Autoencoder for Speech Enhancement with a Noise-Aware
Encoder
- Authors: Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann
- Abstract summary: We propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs.
We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters.
- Score: 30.318947721658862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, a generative variational autoencoder (VAE) has been proposed for
speech enhancement to model speech statistics. However, this approach only uses
clean speech in the training phase, making the estimation particularly
sensitive to noise presence, especially in low signal-to-noise ratios (SNRs).
To increase the robustness of the VAE, we propose to include noise information
in the training phase by using a noise-aware encoder trained on noisy-clean
speech pairs. We evaluate our approach on real recordings of different noisy
environments and acoustic conditions using two different noise datasets. We
show that our proposed noise-aware VAE outperforms the standard VAE in terms of
overall distortion without increasing the number of model parameters. At the
same time, we demonstrate that our model is capable of generalizing to unseen
noise conditions better than a supervised feedforward deep neural network
(DNN). Furthermore, we demonstrate the robustness of the model performance to a
reduction of the noisy-clean speech training data size.
Related papers
- Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Bring the Noise: Introducing Noise Robustness to Pretrained Automatic
Speech Recognition [13.53738829631595]
We propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture.
We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs.
We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions.
arXiv Detail & Related papers (2023-09-05T11:34:21Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Noise-aware Speech Enhancement using Diffusion Probabilistic Model [35.17225451626734]
We propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model.
NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models.
arXiv Detail & Related papers (2023-07-16T12:46:11Z) - Unsupervised speech enhancement with deep dynamical generative speech
and noise models [26.051535142743166]
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.
We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both.
arXiv Detail & Related papers (2023-06-13T14:52:35Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional
Resampling [34.565077865854484]
We propose noise adaptive speech enhancement with target-conditional resampling (NASTAR)
NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.
Experimental results show that NASTAR can effectively use one noisy speech sample to adapt an SE model to a target condition.
arXiv Detail & Related papers (2022-06-18T00:15:48Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images.
To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation.
Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.