Improve GAN-based Neural Vocoder using Pointwise Relativistic
LeastSquare GAN
- URL: http://arxiv.org/abs/2103.14245v2
- Date: Mon, 29 Mar 2021 03:00:21 GMT
- Title: Improve GAN-based Neural Vocoder using Pointwise Relativistic
LeastSquare GAN
- Authors: Congyi Wang, Yu Chen, Bin Wang, Yi Shi
- Abstract summary: We introduce a novel variant of the LSGAN framework under the context of waveform synthesis, named Pointwise Relativistic LSGAN (PRLSGAN)
PRLSGAN is a general-purposed framework that can be combined with any GAN-based neural vocoder to enhance its generation quality.
- Score: 9.595035978417322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GAN-based neural vocoders, such as Parallel WaveGAN and MelGAN have attracted
great interest due to their lightweight and parallel structures, enabling them
to generate high fidelity waveform in a real-time manner. In this paper,
inspired by Relativistic GAN, we introduce a novel variant of the LSGAN
framework under the context of waveform synthesis, named Pointwise Relativistic
LSGAN (PRLSGAN). In this approach, we take the truism score distribution into
consideration and combine the original MSE loss with the proposed pointwise
relative discrepancy loss to increase the difficulty of the generator to fool
the discriminator, leading to improved generation quality. Moreover, PRLSGAN is
a general-purposed framework that can be combined with any GAN-based neural
vocoder to enhance its generation quality. Experiments have shown a consistent
performance boost based on Parallel WaveGAN and MelGAN, demonstrating the
effectiveness and strong generalization ability of our proposed PRLSGAN neural
vocoders.
Related papers
- Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Radio Generation Using Generative Adversarial Networks with An Unrolled
Design [18.049453261384013]
We develop a novel GAN framework for radio generation called "Radio GAN"
The first is learning based on sampling points, which aims to model an underlying sampling distribution of radio signals.
The second is an unrolled generator design, combined with an estimated pure signal distribution as a prior, which can greatly reduce learning difficulty.
arXiv Detail & Related papers (2023-06-24T07:47:22Z) - WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis [4.689359813220365]
We propose an effective and lightweight neural vocoder called WOLONet.
In this paper, we develop a novel lightweight block that uses a location-variable, channel-independent, and depthwise dynamic convolutional kernel with sinusoidally activated dynamic kernel weights.
The results show that our WOLONet achieves the best generation quality while requiring fewer parameters than the two neural SOTA vocoders, HiFiGAN and UnivNet.
arXiv Detail & Related papers (2022-06-20T17:58:52Z) - Hierarchical Spherical CNNs with Lifting-based Adaptive Wavelets for
Pooling and Unpooling [101.72318949104627]
We propose a novel framework of hierarchical convolutional neural networks (HS-CNNs) with a lifting structure to learn adaptive spherical wavelets for pooling and unpooling.
LiftHS-CNN ensures a more efficient hierarchical feature learning for both image- and pixel-level tasks.
arXiv Detail & Related papers (2022-05-31T07:23:42Z) - Revisiting GANs by Best-Response Constraint: Perspective, Methodology,
and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator.
We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z) - Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation
Generation [32.839539624717546]
This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism.
The modified uSFGAN significantly improves the sound quality of the basic uSFGAN while maintaining the voice controllability.
arXiv Detail & Related papers (2022-05-12T12:41:15Z) - Unified Source-Filter GAN: Unified Source-filter Network Based On
Factorization of Quasi-Periodic Parallel WaveGAN [36.12470085926042]
We propose a unified approach to data-driven source-filter modeling using a single neural network for developing a neural vocoder.
Our proposed network called unified source-filter generative adversarial networks (uSFGAN) is developed by factorizing quasi-periodic parallel WaveGAN.
Experiments demonstrate that uSFGAN outperforms conventional neural vocoders, such as QPPWG and NSF in both speech quality and pitch controllability.
arXiv Detail & Related papers (2021-04-10T02:38:26Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Unpaired Image Enhancement with Quality-Attention Generative Adversarial
Network [92.01145655155374]
We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data.
Key novelty of the proposed QAGAN lies in the injected QAM for the generator.
Our proposed method achieves better performance in both objective and subjective evaluations.
arXiv Detail & Related papers (2020-12-30T05:57:20Z) - StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with
Temporal Adaptive Normalization [9.866072912049031]
StyleMelGAN is a lightweight neural vocoder allowing synthesis of high-fidelity speech with low computational complexity.
StyleMelGAN employs temporal adaptive normalization to style a low-dimensional noise vector with the acoustic features of the target speech.
The highly parallelizable speech generation is several times faster than real-time on CPUs and GPU.
arXiv Detail & Related papers (2020-11-03T08:28:47Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.