Unified Source-Filter GAN: Unified Source-filter Network Based On
Factorization of Quasi-Periodic Parallel WaveGAN
- URL: http://arxiv.org/abs/2104.04668v2
- Date: Tue, 13 Apr 2021 03:14:06 GMT
- Title: Unified Source-Filter GAN: Unified Source-filter Network Based On
Factorization of Quasi-Periodic Parallel WaveGAN
- Authors: Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
- Abstract summary: We propose a unified approach to data-driven source-filter modeling using a single neural network for developing a neural vocoder.
Our proposed network called unified source-filter generative adversarial networks (uSFGAN) is developed by factorizing quasi-periodic parallel WaveGAN.
Experiments demonstrate that uSFGAN outperforms conventional neural vocoders, such as QPPWG and NSF in both speech quality and pitch controllability.
- Score: 36.12470085926042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a unified approach to data-driven source-filter modeling using a
single neural network for developing a neural vocoder capable of generating
high-quality synthetic speech waveforms while retaining flexibility of the
source-filter model to control their voice characteristics. Our proposed
network called unified source-filter generative adversarial networks (uSFGAN)
is developed by factorizing quasi-periodic parallel WaveGAN (QPPWG), one of the
neural vocoders based on a single neural network, into a source excitation
generation network and a vocal tract resonance filtering network by
additionally implementing a regularization loss. Moreover, inspired by neural
source filter (NSF), only a sinusoidal waveform is additionally used as the
simplest clue to generate a periodic source excitation waveform while
minimizing the effect of approximations in the source filter model. The
experimental results demonstrate that uSFGAN outperforms conventional neural
vocoders, such as QPPWG and NSF in both speech quality and pitch
controllability.
Related papers
- PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a
Diffusion Probabilistic Model [12.292092677396349]
This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM)
Our model aims to accurately capture the periodic structure of speech waveforms by incorporating explicit periodic signals.
Experimental results show that our model improves sound quality and provides better pitch control than conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2024-02-22T16:47:15Z) - Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural
Vocoder [29.219277429553788]
We introduce the source-filter theory into HiFi-GAN to achieve high voice quality and pitch controllability.
Our proposed method outperforms HiFi-GAN and uSFGAN on a singing voice generation in voice quality and synthesis speed on a single CPU.
Unlike the uSFGAN vocoder, the proposed method can be easily adopted/integrated in real-time applications and end-to-end systems.
arXiv Detail & Related papers (2022-10-27T15:19:09Z) - Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation
Generation [32.839539624717546]
This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism.
The modified uSFGAN significantly improves the sound quality of the basic uSFGAN while maintaining the voice controllability.
arXiv Detail & Related papers (2022-05-12T12:41:15Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Deep Learning for the Benes Filter [91.3755431537592]
We present a new numerical method based on the mesh-free neural network representation of the density of the solution of the Benes model.
We discuss the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network.
arXiv Detail & Related papers (2022-03-09T14:08:38Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Frequency-bin entanglement from domain-engineered down-conversion [101.18253437732933]
We present a single-pass source of discrete frequency-bin entanglement which does not use filtering or a resonant cavity.
We use a domain-engineered nonlinear crystal to generate an eight-mode frequency-bin entangled source at telecommunication wavelengths.
arXiv Detail & Related papers (2022-01-18T19:00:29Z) - Improve GAN-based Neural Vocoder using Pointwise Relativistic
LeastSquare GAN [9.595035978417322]
We introduce a novel variant of the LSGAN framework under the context of waveform synthesis, named Pointwise Relativistic LSGAN (PRLSGAN)
PRLSGAN is a general-purposed framework that can be combined with any GAN-based neural vocoder to enhance its generation quality.
arXiv Detail & Related papers (2021-03-26T03:35:22Z) - Noise Homogenization via Multi-Channel Wavelet Filtering for
High-Fidelity Sample Generation in GANs [47.92719758687014]
We propose a novel multi-channel wavelet-based filtering method for Generative Adversarial Networks (GANs)
When embedding a wavelet deconvolution layer in the generator, the resultant GAN, called WaveletGAN, takes advantage of the wavelet deconvolution to learn a filtering with multiple channels.
We conducted benchmark experiments on the Fashion-MNIST, KMNIST and SVHN datasets through an open GAN benchmark tool.
arXiv Detail & Related papers (2020-05-14T03:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.