Improving Stability of LS-GANs for Audio and Speech Signals
- URL: http://arxiv.org/abs/2008.05454v1
- Date: Wed, 12 Aug 2020 17:41:25 GMT
- Title: Improving Stability of LS-GANs for Audio and Speech Signals
- Authors: Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges,
Patrick Cardinal, Alessandro Lameiras Koerich
- Abstract summary: We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
- Score: 70.15099665710336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we address the instability issue of generative adversarial
network (GAN) by proposing a new similarity metric in unitary space of Schur
decomposition for 2D representations of audio and speech signals. We show that
encoding departure from normality computed in this vector space into the
generator optimization formulation helps to craft more comprehensive
spectrograms. We demonstrate the effectiveness of binding this metric for
enhancing stability in training with less mode collapse compared to baseline
GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice
datasets have shown considerable improvements on the quality of the generated
samples measured by the Fr\'echet inception distance. Moreover, reconstructed
signals from these samples, have achieved higher signal to noise ratio compared
to regular LS-GANs.
Related papers
- Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN.
We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z) - Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and
Spectral Optimal Transport [0.0]
We propose a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy.
We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals.
We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer.
arXiv Detail & Related papers (2023-12-22T08:10:30Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Speech enhancement with frequency domain auto-regressive modeling [34.55703785405481]
Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation.
We propose a unified framework of speech dereverberation for improving the speech quality and the automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2023-09-24T03:25:51Z) - Hyperspectral Image Denoising via Self-Modulating Convolutional Neural
Networks [15.700048595212051]
We introduce a self-modulating convolutional neural network which utilizes correlated spectral and spatial information.
At the core of the model lies a novel block, which allows the network to transform the features in an adaptive manner based on the adjacent spectral data.
Experimental analysis on both synthetic and real data shows that the proposed SM-CNN outperforms other state-of-the-art HSI denoising methods.
arXiv Detail & Related papers (2023-09-15T06:57:43Z) - Spectral Enhanced Rectangle Transformer for Hyperspectral Image
Denoising [64.11157141177208]
We propose a spectral enhanced rectangle Transformer to model the spatial and spectral correlation in hyperspectral images.
For the former, we exploit the rectangle self-attention horizontally and vertically to capture the non-local similarity in the spatial domain.
For the latter, we design a spectral enhancement module that is capable of extracting global underlying low-rank property of spatial-spectral cubes to suppress noise.
arXiv Detail & Related papers (2023-04-03T09:42:13Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.