Related papers: JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

URL: http://arxiv.org/abs/2406.06111v1
Date: Mon, 10 Jun 2024 08:51:04 GMT
Title: JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung,
Abstract summary: Non-autoregressive GAN-based neural vocoders often suffer from audible artifacts such as tonal artifacts in their generated results. We propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.
Score: 7.786188453649591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.

Related papers

Robust VAEs via Generating Process of Noise Augmented Data [9.366139389037489]
This paper introduces a novel framework that enhances robustness by regularizing the latent space divergence between original and noise-augmented data. Our empirical evaluations demonstrate that this approach, termed Robust Augmented Variational Auto-ENcoder (RAVEN), yields superior performance in resisting adversarial inputs.
arXiv Detail & Related papers (2024-07-26T09:55:34Z)
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z)
DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques. We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process. Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z)
LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks. GANs enable diverse augmentation by learning and sampling from the data distribution. GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation. We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z)
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm. We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models. The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z)
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z)
Orthogonal Features Based EEG Signals Denoising Using Fractional and Compressed One-Dimensional CNN AutoEncoder [3.8580784887142774]
This paper presents a fractional one-dimensional convolutional neural network (CNN) autoencoder for denoising the Electroencephalogram (EEG) signals. EEG signals often get contaminated with noise during the recording process, mostly due to muscle artifacts (MA)
arXiv Detail & Related papers (2021-04-16T13:58:05Z)
Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN) TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem. In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z)
Low-Complexity Models for Acoustic Scene Classification Based on Receptive Field Regularization and Frequency Damping [7.0349768355860895]
We investigate and compare several well-known methods to reduce the number of parameters in neural networks. We show that we can achieve high-performing low-complexity models by applying specific restrictions on the Receptive Field. We propose a filter-damping technique for regularizing the RF of models, without altering their architecture.
arXiv Detail & Related papers (2020-11-05T16:34:11Z)
Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm. When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic' We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.