A Two-stage Complex Network using Cycle-consistent Generative
Adversarial Networks for Speech Enhancement
- URL: http://arxiv.org/abs/2109.02011v1
- Date: Sun, 5 Sep 2021 07:09:10 GMT
- Title: A Two-stage Complex Network using Cycle-consistent Generative
Adversarial Networks for Speech Enhancement
- Authors: Guochen Yu, Yutian Wang, Hui Wang, Qin Zhang, Chengshi Zheng
- Abstract summary: We propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a complex spectral refining network.
Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems.
- Score: 7.676549056780494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cycle-consistent generative adversarial networks (CycleGAN) have shown their
promising performance for speech enhancement (SE), while one intractable
shortcoming of these CycleGAN-based SE systems is that the noise components
propagate throughout the cycle and cannot be completely eliminated.
Additionally, conventional CycleGAN-based SE systems only estimate the spectral
magnitude, while the phase is unaltered. Motivated by the multi-stage learning
concept, we propose a novel two-stage denoising system that combines a
CycleGAN-based magnitude enhancing network and a subsequent complex spectral
refining network in this paper. Specifically, in the first stage, a
CycleGAN-based model is responsible for only estimating magnitude, which is
subsequently coupled with the original noisy phase to obtain a coarsely
enhanced complex spectrum. After that, the second stage is applied to further
suppress the residual noise components and estimate the clean phase by a
complex spectral mapping network, which is a pure complex-valued network
composed of complex 2D convolution/deconvolution and complex temporal-frequency
attention blocks. Experimental results on two public datasets demonstrate that
the proposed approach consistently surpasses previous one-stage CycleGANs and
other state-of-the-art SE systems in terms of various evaluation metrics,
especially in background noise suppression.
Related papers
- A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - Amplitude-Phase Recombination: Rethinking Robustness of Convolutional
Neural Networks in Frequency Domain [31.182376196295365]
CNN tends to converge at the local optimum which is closely related to the high-frequency components of the training images.
A new perspective on data augmentation designed by re-combing the phase spectrum of the current image and the amplitude spectrum of the distracter image.
arXiv Detail & Related papers (2021-08-19T04:04:41Z) - Two-Stage Self-Supervised Cycle-Consistency Network for Reconstruction
of Thin-Slice MR Images [62.4428833931443]
The thick-slice magnetic resonance (MR) images are often structurally blurred in coronal and sagittal views.
Deep learning has shown great potential to re-construct the high-resolution (HR) thin-slice MR images from those low-resolution (LR) cases.
We propose a novel Two-stage Self-supervised Cycle-consistency Network (TSCNet) for MR slice reconstruction.
arXiv Detail & Related papers (2021-06-29T13:29:18Z) - Accurate and Robust Deep Learning Framework for Solving Wave-Based
Inverse Problems in the Super-Resolution Regime [1.933681537640272]
We propose an end-to-end deep learning framework that comprehensively solves the inverse wave scattering problem across all length scales.
Our framework consists of the newly introduced wide-band butterfly network coupled with a simple training procedure that dynamically injects noise during training.
arXiv Detail & Related papers (2021-06-02T13:30:28Z) - Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose
CT Denoising [33.79188588182528]
CycleGAN provides high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) images.
CycleGAN requires two generators and two discriminators to enforce cycle consistency.
We present a novel cycle-free Cycle-GAN architecture, which consists of a single generator and a discriminator but still guarantees cycle consistency.
arXiv Detail & Related papers (2021-04-17T13:23:36Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z) - Identity Enhanced Residual Image Denoising [61.75610647978973]
We learn a fully-convolutional network model that consists of a Chain of Identity Mapping Modules and residual on the residual architecture for image denoising.
The proposed network produces remarkably higher numerical accuracy and better visual image quality than the classical state-of-the-art and CNN algorithms.
arXiv Detail & Related papers (2020-04-26T04:52:22Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.