Related papers: Conditioning Trick for Training Stable GANs

Conditioning Trick for Training Stable GANs

URL: http://arxiv.org/abs/2010.05844v1
Date: Mon, 12 Oct 2020 16:50:22 GMT
Title: Conditioning Trick for Training Stable GANs
Authors: Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich
Abstract summary: We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
Score: 70.15099665710336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice dataset have shown that the proposed GAN configuration with the conditioning trick remarkably outperforms baseline architectures, according to three objective metrics: inception score, Frechet inception distance, and signal-to-noise ratio.

Related papers

STAF: Sinusoidal Trainable Activation Functions for Implicit Neural Representation [7.2888019138115245]
Implicit Neural Representations (INRs) have emerged as a powerful framework for modeling continuous signals. The spectral bias of ReLU-based networks is a well-established limitation, restricting their ability to capture fine-grained details in target signals. We introduce Sinusoidal Trainable Functions Activation (STAF) STAF inherently modulates its frequency components, allowing for self-adaptive spectral learning.
arXiv Detail & Related papers (2025-02-02T18:29:33Z)
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN. We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z)
cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation [0.7853804618032806]
We present a novel conditional model in the Generative Adrial Network framework for simulating multiple classes of time-domain observations. Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes. Our experiments show that training convolutional neural networks with our cDVGAN-generated data improves the detection of samples embedded in detector noise.
arXiv Detail & Related papers (2024-01-29T17:59:26Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Simpler is better: spectral regularization and up-sampling techniques for variational autoencoders [1.2234742322758418]
characterization of the spectral behavior of generative models based on neural networks remains an open issue. Recent research has focused heavily on generative adversarial networks and the high-frequency discrepancies between real and generated images. We propose a simple 2D Fourier transform-based spectral regularization loss for the Variational Autoencoders (VAEs)
arXiv Detail & Related papers (2022-01-19T11:49:57Z)
PILOT: Introducing Transformers for Probabilistic Sound Event Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Axial Residual Networks for CycleGAN-based Voice Conversion [0.0]
We propose a novel architecture and improved training objectives for non-parallel voice conversion. Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram. We demonstrate via experiments that our proposed model outperforms Scyclone and shows a comparable or better performance to that of CycleGAN-VC2 even without employing a neural vocoder.
arXiv Detail & Related papers (2021-02-16T10:55:35Z)
Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z)
Noise Homogenization via Multi-Channel Wavelet Filtering for High-Fidelity Sample Generation in GANs [47.92719758687014]
We propose a novel multi-channel wavelet-based filtering method for Generative Adversarial Networks (GANs) When embedding a wavelet deconvolution layer in the generator, the resultant GAN, called WaveletGAN, takes advantage of the wavelet deconvolution to learn a filtering with multiple channels. We conducted benchmark experiments on the Fashion-MNIST, KMNIST and SVHN datasets through an open GAN benchmark tool.
arXiv Detail & Related papers (2020-05-14T03:40:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.