Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation
Generation
- URL: http://arxiv.org/abs/2205.06053v1
- Date: Thu, 12 May 2022 12:41:15 GMT
- Title: Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation
Generation
- Authors: Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
- Abstract summary: This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism.
The modified uSFGAN significantly improves the sound quality of the basic uSFGAN while maintaining the voice controllability.
- Score: 32.839539624717546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a unified source-filter network with a
harmonic-plus-noise source excitation generation mechanism. In our previous
work, we proposed unified Source-Filter GAN (uSFGAN) for developing a
high-fidelity neural vocoder with flexible voice controllability using a
unified source-filter neural network architecture. However, the capability of
uSFGAN to model the aperiodic source excitation signal is insufficient, and
there is still a gap in sound quality between the natural and generated speech.
To improve the source excitation modeling and generated sound quality, a new
source excitation generation network separately generating periodic and
aperiodic components is proposed. The advanced adversarial training procedure
of HiFiGAN is also adopted to replace that of Parallel WaveGAN used in the
original uSFGAN. Both objective and subjective evaluation results show that the
modified uSFGAN significantly improves the sound quality of the basic uSFGAN
while maintaining the voice controllability.
Related papers
- Radio Generation Using Generative Adversarial Networks with An Unrolled
Design [18.049453261384013]
We develop a novel GAN framework for radio generation called "Radio GAN"
The first is learning based on sampling points, which aims to model an underlying sampling distribution of radio signals.
The second is an unrolled generator design, combined with an estimated pure signal distribution as a prior, which can greatly reduce learning difficulty.
arXiv Detail & Related papers (2023-06-24T07:47:22Z) - SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation.
We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation.
Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z) - Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural
Vocoder [29.219277429553788]
We introduce the source-filter theory into HiFi-GAN to achieve high voice quality and pitch controllability.
Our proposed method outperforms HiFi-GAN and uSFGAN on a singing voice generation in voice quality and synthesis speed on a single CPU.
Unlike the uSFGAN vocoder, the proposed method can be easily adopted/integrated in real-time applications and end-to-end systems.
arXiv Detail & Related papers (2022-10-27T15:19:09Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Nonlinear Transform Source-Channel Coding for Semantic Communications [7.81628437543759]
We propose a new class of high-efficient deep joint source-channel coding methods that can closely adapt to the source distribution under the nonlinear transform.
Our model incorporates the nonlinear transform as a strong prior to effectively extract the source semantic features.
Notably, the proposed NTSCC method can potentially support future semantic communications due to its vigorous content-aware ability.
arXiv Detail & Related papers (2021-12-21T03:30:46Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Unified Source-Filter GAN: Unified Source-filter Network Based On
Factorization of Quasi-Periodic Parallel WaveGAN [36.12470085926042]
We propose a unified approach to data-driven source-filter modeling using a single neural network for developing a neural vocoder.
Our proposed network called unified source-filter generative adversarial networks (uSFGAN) is developed by factorizing quasi-periodic parallel WaveGAN.
Experiments demonstrate that uSFGAN outperforms conventional neural vocoders, such as QPPWG and NSF in both speech quality and pitch controllability.
arXiv Detail & Related papers (2021-04-10T02:38:26Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Improve GAN-based Neural Vocoder using Pointwise Relativistic
LeastSquare GAN [9.595035978417322]
We introduce a novel variant of the LSGAN framework under the context of waveform synthesis, named Pointwise Relativistic LSGAN (PRLSGAN)
PRLSGAN is a general-purposed framework that can be combined with any GAN-based neural vocoder to enhance its generation quality.
arXiv Detail & Related papers (2021-03-26T03:35:22Z) - Unpaired Image Enhancement with Quality-Attention Generative Adversarial
Network [92.01145655155374]
We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data.
Key novelty of the proposed QAGAN lies in the injected QAM for the generator.
Our proposed method achieves better performance in both objective and subjective evaluations.
arXiv Detail & Related papers (2020-12-30T05:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.