Improving GANs for Speech Enhancement
- URL: http://arxiv.org/abs/2001.05532v3
- Date: Sat, 12 Sep 2020 23:48:06 GMT
- Title: Improving GANs for Speech Enhancement
- Authors: Huy Phan and Ian V. McLoughlin and Lam Pham and Oliver Y. Ch\'en and
Philipp Koch and Maarten De Vos and Alfred Mertins
- Abstract summary: We propose to use multiple generators chained to perform multi-stage enhancement mapping.
We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline.
- Score: 19.836041050328102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial networks (GAN) have recently been shown to be
efficient for speech enhancement. However, most, if not all, existing speech
enhancement GANs (SEGAN) make use of a single generator to perform one-stage
enhancement mapping. In this work, we propose to use multiple generators that
are chained to perform multi-stage enhancement mapping, which gradually refines
the noisy input signals in a stage-wise fashion. Furthermore, we study two
scenarios: (1) the generators share their parameters and (2) the generators'
parameters are independent. The former constrains the generators to learn a
common mapping that is iteratively applied at all enhancement stages and
results in a small model footprint. On the contrary, the latter allows the
generators to flexibly learn different enhancement mappings at different stages
of the network at the cost of an increased model size. We demonstrate that the
proposed multi-stage enhancement approach outperforms the one-stage SEGAN
baseline, where the independent generators lead to more favorable results than
the tied generators. The source code is available at
http://github.com/pquochuy/idsegan.
Related papers
- Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Learning Probabilistic Models from Generator Latent Spaces with Hat EBM [81.35199221254763]
This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM)
Experiments show strong performance of the proposed method on (1) unconditional ImageNet synthesis at 128x128 resolution, (2) refining the output of existing generators, and (3) learning EBMs that incorporate non-probabilistic generators.
arXiv Detail & Related papers (2022-10-29T03:55:34Z) - Structural Prior Guided Generative Adversarial Transformers for
Low-Light Image Enhancement [51.22694467126883]
We propose an effective Structural Prior guided Generative Adversarial Transformer (SPGAT) to solve low-light image enhancement.
The generator is based on a U-shaped Transformer which is used to explore non-local information for better clear image restoration.
To generate more realistic images, we develop a new structural prior guided adversarial learning method by building the skip connections between the generator and discriminators.
arXiv Detail & Related papers (2022-07-16T04:05:40Z) - Toward Spatially Unbiased Generative Models [19.269719158344508]
Recent image generation models show remarkable generation performance.
However, they mirror strong location preference in datasets, which we call spatial bias.
We argue that the generators rely on their implicit positional encoding to render spatial content.
arXiv Detail & Related papers (2021-08-03T04:13:03Z) - Total Generate: Cycle in Cycle Generative Adversarial Networks for
Generating Human Faces, Hands, Bodies, and Natural Scenes [76.83075646527521]
Cycle in Cycle Generative Adversarial Network (C2GAN) for human faces, hands, bodies, and natural scenes.
Our proposed C2GAN is a cross-modal model exploring the joint exploitation of the input image data and guidance data in an interactive manner.
arXiv Detail & Related papers (2021-06-21T06:20:16Z) - Combining Transformer Generators with Convolutional Discriminators [9.83490307808789]
Recently proposed TransGAN is the first GAN using only transformer-based architectures.
TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism.
We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results.
arXiv Detail & Related papers (2021-05-21T07:56:59Z) - Slimmable Generative Adversarial Networks [54.61774365777226]
Generative adversarial networks (GANs) have achieved remarkable progress in recent years, but the continuously growing scale of models makes them challenging to deploy widely in practical applications.
In this paper, we introduce slimmable GANs, which can flexibly switch the width of the generator to accommodate various quality-efficiency trade-offs at runtime.
arXiv Detail & Related papers (2020-12-10T13:35:22Z) - Remote sensing image fusion based on Bayesian GAN [9.852262451235472]
We build a two-stream generator network with PAN and MS images as input, which consists of three parts: feature extraction, feature fusion and image reconstruction.
We leverage Markov discriminator to enhance the ability of generator to reconstruct the fusion image, so that the result image can retain more details.
Experiments on QuickBird and WorldView datasets show that the model proposed in this paper can effectively fuse PAN and MS images.
arXiv Detail & Related papers (2020-09-20T16:15:51Z) - Unconditional Audio Generation with Generative Adversarial Networks and
Cycle Regularization [48.55126268721948]
We present a generative adversarial network (GAN)-based model for unconditional generation of the mel-spectrograms of singing voices.
We employ a hierarchical architecture in the generator to induce some structure in the temporal dimension.
We evaluate the performance of the new model not only for generating singing voices, but also for generating speech voices.
arXiv Detail & Related papers (2020-05-18T08:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.