NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
Synthesis based on Frequency Modulation
- URL: http://arxiv.org/abs/2305.12868v1
- Date: Mon, 22 May 2023 09:46:10 GMT
- Title: NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
Synthesis based on Frequency Modulation
- Authors: Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo
- Abstract summary: We propose NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer.
Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge.
- Score: 38.00669627261736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing digital sound synthesizers is crucial to the music industry as it
provides a low-cost way to produce high-quality sounds with rich timbres.
Existing traditional synthesizers often require substantial expertise to
determine the overall framework of a synthesizer and the parameters of
submodules. Since expert knowledge is hard to acquire, it hinders the
flexibility to quickly design and tune digital synthesizers for diverse sounds.
In this paper, we propose ``NAS-FM'', which adopts neural architecture search
(NAS) to build a differentiable frequency modulation (FM) synthesizer. Tunable
synthesizers with interpretable controls can be developed automatically from
sounds without any prior expert knowledge and manual operating costs. In
detail, we train a supernet with a specifically designed search space,
including predicting the envelopes of carriers and modulators with different
frequency ratios. An evolutionary search algorithm with adaptive oscillator
size is then developed to find the optimal relationship between oscillators and
the frequency ratio of FM. Extensive experiments on recordings of different
instrument sounds show that our algorithm can build a synthesizer fully
automatically, achieving better results than handcrafted synthesizers. Audio
samples are available at https://nas-fm.github.io/.
Related papers
- Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation [52.0893266767733]
We propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features.
To enhance the model's robustness to different synthesizer characteristics, we propose a synthesizer feature augmentation strategy.
arXiv Detail & Related papers (2024-11-14T03:57:21Z) - Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - DDX7: Differentiable FM Synthesis of Musical Instrument Sounds [7.829520196474829]
Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs)
We present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds.
arXiv Detail & Related papers (2022-08-12T08:39:45Z) - Multi-instrument Music Synthesis with Spectrogram Diffusion [19.81982315173444]
We focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime.
We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter.
We find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.
arXiv Detail & Related papers (2022-06-11T03:26:15Z) - Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [19.13182347908491]
The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem.
We proposed a multi-modal deep-learning-based pipeline Sound2 Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem.
Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
arXiv Detail & Related papers (2022-05-06T06:55:29Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Synthesizer: Rethinking Self-Attention in Transformer Models [93.08171885200922]
dot product self-attention is central and indispensable to state-of-the-art Transformer models.
This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models.
arXiv Detail & Related papers (2020-05-02T08:16:19Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.