Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and
Spectral Optimal Transport
- URL: http://arxiv.org/abs/2312.14507v3
- Date: Mon, 15 Jan 2024 10:41:32 GMT
- Title: Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and
Spectral Optimal Transport
- Authors: Bernardo Torres (S2A, IDS), Geoffroy Peeters (S2A, IDS), Ga\"el
Richard (S2A, IDS)
- Abstract summary: We propose a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy.
We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals.
We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In neural audio signal processing, pitch conditioning has been used to
enhance the performance of synthesizers. However, jointly training pitch
estimators and synthesizers is a challenge when using standard audio-to-audio
reconstruction loss, leading to reliance on external pitch trackers. To address
this issue, we propose using a spectral loss function inspired by optimal
transportation theory that minimizes the displacement of spectral energy. We
validate this approach through an unsupervised autoencoding task that fits a
harmonic template to harmonic signals. We jointly estimate the fundamental
frequency and amplitudes of harmonics using a lightweight encoder and
reconstruct the signals using a differentiable harmonic synthesizer. The
proposed approach offers a promising direction for improving unsupervised
parameter estimation in neural audio applications.
Related papers
- Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - Differentiable WORLD Synthesizer-based Neural Vocoder With Application
To End-To-End Audio Style Transfer [6.29475963948119]
We propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks.
Our baseline differentiable synthesizer has no model parameters, yet it yields adequate quality synthesis.
An alternative differentiable approach considers extraction of the source spectrum directly, which can improve naturalness.
arXiv Detail & Related papers (2022-08-15T15:48:36Z) - Blind Equalization and Channel Estimation in Coherent Optical
Communications Using Variational Autoencoders [1.7188280334580193]
We investigate the potential of adaptive blind equalizers based on variational inference for carrier recovery in optical communications.
We generalize the concept of variational autoencoder (VAE) equalizers to higher order modulation formats.
arXiv Detail & Related papers (2022-04-25T16:46:03Z) - Iterative Adaptive Spectroscopy of Short Signals [0.1338174941551702]
We develop an adaptive frequency sensing protocol based on Ramsey interferometry.
High precision is achieved by enhancing the Ramsey sequence to prepare with high fidelity both the sensing and readout state.
arXiv Detail & Related papers (2022-04-10T18:07:50Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Differentiable Digital Signal Processing Mixture Model for Synthesis
Parameter Extraction from Mixture of Harmonic Sounds [29.012177604120048]
A differentiable digital signal processing (DDSP) autoencoder is a musical sound that combines a deep neural network (DNN) and spectral modeling synthesis.
It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound.
It is designed for a monophonic harmonic sound and cannot handle mixtures of sounds harmonic.
arXiv Detail & Related papers (2022-02-01T03:38:49Z) - Adaptive Low-Pass Filtering using Sliding Window Gaussian Processes [71.23286211775084]
We propose an adaptive low-pass filter based on Gaussian process regression.
We show that the estimation error of the proposed method is uniformly bounded.
arXiv Detail & Related papers (2021-11-05T17:06:59Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.