Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation
- URL: http://arxiv.org/abs/2205.03043v1
- Date: Fri, 6 May 2022 06:55:29 GMT
- Title: Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation
- Authors: Zui Chen, Yansen Jing, Shengcheng Yuan, Yifei Xu, Jian Wu and Hang
Zhao
- Abstract summary: The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem.
We proposed a multi-modal deep-learning-based pipeline Sound2 Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem.
Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
- Score: 19.13182347908491
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Synthesizer is a type of electronic musical instrument that is now widely
used in modern music production and sound design. Each parameters configuration
of a synthesizer produces a unique timbre and can be viewed as a unique
instrument. The problem of estimating a set of parameters configuration that
best restore a sound timbre is an important yet complicated problem, i.e.: the
synthesizer parameters estimation problem. We proposed a multi-modal
deep-learning-based pipeline Sound2Synth, together with a network structure
Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our
method achieved not only SOTA but also the first real-world applicable results
on Dexed synthesizer, a popular FM synthesizer.
Related papers
- Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
Synthesis based on Frequency Modulation [38.00669627261736]
We propose NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer.
Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge.
arXiv Detail & Related papers (2023-05-22T09:46:10Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - DDX7: Differentiable FM Synthesis of Musical Instrument Sounds [7.829520196474829]
Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs)
We present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds.
arXiv Detail & Related papers (2022-08-12T08:39:45Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - One Billion Audio Sounds from GPU-enabled Modular Synthesis [5.5022962399775945]
synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, is 100x larger than any audio dataset in the literature.
synth1B1 samples are deterministically generated on-the-fly 16200x faster than real-time (714MHz) on a single GPU.
arXiv Detail & Related papers (2021-04-27T00:38:52Z) - Synthesizer: Rethinking Self-Attention in Transformer Models [93.08171885200922]
dot product self-attention is central and indispensable to state-of-the-art Transformer models.
This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models.
arXiv Detail & Related papers (2020-05-02T08:16:19Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.