JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
- URL: http://arxiv.org/abs/2406.06111v1
- Date: Mon, 10 Jun 2024 08:51:04 GMT
- Title: JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
- Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung,
- Abstract summary: Non-autoregressive GAN-based neural vocoders often suffer from audible artifacts such as tonal artifacts in their generated results.
We propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property.
In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.
- Score: 7.786188453649591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.
Related papers
- Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques.
We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process.
Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z) - LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral
Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks.
GANs enable diverse augmentation by learning and sampling from the data distribution.
GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation.
We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly
Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm.
We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models.
The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators.
We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z) - Orthogonal Features Based EEG Signals Denoising Using Fractional and
Compressed One-Dimensional CNN AutoEncoder [3.8580784887142774]
This paper presents a fractional one-dimensional convolutional neural network (CNN) autoencoder for denoising the Electroencephalogram (EEG) signals.
EEG signals often get contaminated with noise during the recording process, mostly due to muscle artifacts (MA)
arXiv Detail & Related papers (2021-04-16T13:58:05Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Low-Complexity Models for Acoustic Scene Classification Based on
Receptive Field Regularization and Frequency Damping [7.0349768355860895]
We investigate and compare several well-known methods to reduce the number of parameters in neural networks.
We show that we can achieve high-performing low-complexity models by applying specific restrictions on the Receptive Field.
We propose a filter-damping technique for regularizing the RF of models, without altering their architecture.
arXiv Detail & Related papers (2020-11-05T16:34:11Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.