Simultaneous Denoising and Dereverberation Using Deep Embedding Features
- URL: http://arxiv.org/abs/2004.02420v1
- Date: Mon, 6 Apr 2020 06:34:01 GMT
- Title: Simultaneous Denoising and Dereverberation Using Deep Embedding Features
- Authors: Cunhang Fan and Jianhua Tao and Bin Liu and Jiangyan Yi and Zhengqi
Wen
- Abstract summary: We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
- Score: 64.58693911070228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monaural speech dereverberation is a very challenging task because no spatial
cues can be used. When the additive noises exist, this task becomes more
challenging. In this paper, we propose a joint training method for simultaneous
speech denoising and dereverberation using deep embedding features, which is
based on the deep clustering (DC). DC is a state-of-the-art method for speech
separation that includes embedding learning and K-means clustering. As for our
proposed method, it contains two stages: denoising and dereverberation. At the
denoising stage, the DC network is leveraged to extract noise-free deep
embedding features. These embedding features are generated from the anechoic
speech and residual reverberation signals. They can represent the inferred
spectral masking patterns of the desired signals, which are discriminative
features. At the dereverberation stage, instead of using the unsupervised
K-means clustering algorithm, another supervised neural network is utilized to
estimate the anechoic speech from these deep embedding features. Finally, the
denoising stage and dereverberation stage are optimized by the joint training
method. Experimental results show that the proposed method outperforms the WPE
and BLSTM baselines, especially in the low SNR condition.
Related papers
- Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation [25.410770364140856]
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain.
This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs)
We introduce the notion of dynamic perturbation, which can inject controlled perturbations into the noise embeddings during inference.
arXiv Detail & Related papers (2024-09-03T02:29:01Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Diffusion-based speech enhancement with a weighted generative-supervised
learning loss [0.0]
Diffusion-based generative models have recently gained attention in speech enhancement (SE)
We propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech.
arXiv Detail & Related papers (2023-09-19T09:13:35Z) - Continuous Modeling of the Denoising Process for Speech Enhancement
Based on Deep Learning [61.787485727134424]
We use a state variable to indicate the denoising process.
A UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process.
Experimental results indicate that preserving a small amount of noise in the clean target benefits speech enhancement.
arXiv Detail & Related papers (2023-09-17T13:27:11Z) - Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios [44.31657750561106]
Noise in real-world scenarios is often spatially correlated, which causes many self-supervised algorithms to perform poorly.
We propose Asymmetric Tunable Blind-Spot Network (AT-BSN), where the blind-spot size can be freely adjusted.
We show that our method achieves state-of-the-art, and is superior to other self-supervised algorithms in terms of computational overhead and visual effects.
arXiv Detail & Related papers (2023-03-29T15:19:01Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Distribution Conditional Denoising: A Flexible Discriminative Image
Denoiser [0.0]
A flexible discriminative image denoiser is introduced in which multi-task learning methods are applied to a densoising FCN based on U-Net.
It has been shown that this conditional training method can generalise a fixed noise level U-Net denoiser to a variety of noise levels.
arXiv Detail & Related papers (2020-11-24T21:27:18Z) - Sparse Mixture of Local Experts for Efficient Speech Enhancement [19.645016575334786]
We investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
By splitting up the speech denoising task into non-overlapping subproblems, we are able to improve denoising performance while also reducing computational complexity.
Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network.
arXiv Detail & Related papers (2020-05-16T23:23:22Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.