Modulation Discovery with Differentiable Digital Signal Processing
- URL: http://arxiv.org/abs/2510.06204v1
- Date: Tue, 07 Oct 2025 17:56:24 GMT
- Title: Modulation Discovery with Differentiable Digital Signal Processing
- Authors: Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss,
- Abstract summary: We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing.<n>We make our code and audio samples available and provide the trainedP synths in a VST plugin.
- Score: 3.428276755022932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.
Related papers
- Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.<n>We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.<n>Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - Differentiable WORLD Synthesizer-based Neural Vocoder With Application
To End-To-End Audio Style Transfer [6.29475963948119]
We propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks.
Our baseline differentiable synthesizer has no model parameters, yet it yields adequate quality synthesis.
An alternative differentiable approach considers extraction of the source spectrum directly, which can improve naturalness.
arXiv Detail & Related papers (2022-08-15T15:48:36Z) - DDX7: Differentiable FM Synthesis of Musical Instrument Sounds [7.829520196474829]
Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs)
We present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds.
arXiv Detail & Related papers (2022-08-12T08:39:45Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Differentiable Digital Signal Processing Mixture Model for Synthesis
Parameter Extraction from Mixture of Harmonic Sounds [29.012177604120048]
A differentiable digital signal processing (DDSP) autoencoder is a musical sound that combines a deep neural network (DNN) and spectral modeling synthesis.
It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound.
It is designed for a monophonic harmonic sound and cannot handle mixtures of sounds harmonic.
arXiv Detail & Related papers (2022-02-01T03:38:49Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.