Conditioning Trick for Training Stable GANs
- URL: http://arxiv.org/abs/2010.05844v1
- Date: Mon, 12 Oct 2020 16:50:22 GMT
- Title: Conditioning Trick for Training Stable GANs
- Authors: Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges,
Patrick Cardinal, Alessandro Lameiras Koerich
- Abstract summary: We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
- Score: 70.15099665710336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose a conditioning trick, called difference departure
from normality, applied on the generator network in response to instability
issues during GAN training. We force the generator to get closer to the
departure from normality function of real samples computed in the spectral
domain of Schur decomposition. This binding makes the generator amenable to
truncation and does not limit exploring all the possible modes. We slightly
modify the BigGAN architecture incorporating residual network for synthesizing
2D representations of audio signals which enables reconstructing high quality
sounds with some preserved phase information. Additionally, the proposed
conditional training scenario makes a trade-off between fidelity and variety
for the generated spectrograms. The experimental results on UrbanSound8k and
ESC-50 environmental sound datasets and the Mozilla common voice dataset have
shown that the proposed GAN configuration with the conditioning trick
remarkably outperforms baseline architectures, according to three objective
metrics: inception score, Frechet inception distance, and signal-to-noise
ratio.
Related papers
- SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN.
We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z) - cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation [0.7853804618032806]
We present a novel conditional model in the Generative Adrial Network framework for simulating multiple classes of time-domain observations.
Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes.
Our experiments show that training convolutional neural networks with our cDVGAN-generated data improves the detection of samples embedded in detector noise.
arXiv Detail & Related papers (2024-01-29T17:59:26Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Simpler is better: spectral regularization and up-sampling techniques
for variational autoencoders [1.2234742322758418]
characterization of the spectral behavior of generative models based on neural networks remains an open issue.
Recent research has focused heavily on generative adversarial networks and the high-frequency discrepancies between real and generated images.
We propose a simple 2D Fourier transform-based spectral regularization loss for the Variational Autoencoders (VAEs)
arXiv Detail & Related papers (2022-01-19T11:49:57Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Axial Residual Networks for CycleGAN-based Voice Conversion [0.0]
We propose a novel architecture and improved training objectives for non-parallel voice conversion.
Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram.
We demonstrate via experiments that our proposed model outperforms Scyclone and shows a comparable or better performance to that of CycleGAN-VC2 even without employing a neural vocoder.
arXiv Detail & Related papers (2021-02-16T10:55:35Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z) - Noise Homogenization via Multi-Channel Wavelet Filtering for
High-Fidelity Sample Generation in GANs [47.92719758687014]
We propose a novel multi-channel wavelet-based filtering method for Generative Adversarial Networks (GANs)
When embedding a wavelet deconvolution layer in the generator, the resultant GAN, called WaveletGAN, takes advantage of the wavelet deconvolution to learn a filtering with multiple channels.
We conducted benchmark experiments on the Fashion-MNIST, KMNIST and SVHN datasets through an open GAN benchmark tool.
arXiv Detail & Related papers (2020-05-14T03:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.