SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and
Exploration
- URL: http://arxiv.org/abs/2312.04690v2
- Date: Tue, 20 Feb 2024 20:18:31 GMT
- Title: SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and
Exploration
- Authors: Stephen Brade, Bryan Wang, Mauricio Sousa, Gregory Lee Newsome, Sageev
Oore, Tovi Grossman
- Abstract summary: We implement a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level.
We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound.
- Score: 21.473019531697062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthesizers are powerful tools that allow musicians to create dynamic and
original sounds. Existing commercial interfaces for synthesizers typically
require musicians to interact with complex low-level parameters or to manage
large libraries of premade sounds. To address these challenges, we implement
SynthScribe -- a fullstack system that uses multimodal deep learning to let
users express their intentions at a much higher level. We implement features
which address a number of difficulties, namely 1) searching through existing
sounds, 2) creating completely new sounds, 3) making meaningful modifications
to a given sound. This is achieved with three main features: a multimodal
search engine for a large library of synthesizer sounds; a user centered
genetic algorithm by which completely new sounds can be created and selected
given the users preferences; a sound editing support feature which highlights
and gives examples for key control parameters with respect to a text or audio
based query. The results of our user studies show SynthScribe is capable of
reliably retrieving and modifying sounds while also affording the ability to
create completely new sounds that expand a musicians creative horizon.
Related papers
- OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup [50.70494796172493]
We introduce Omni-modal Sound Separation (OmniSep), a novel framework capable of isolating clean soundtracks based on omni-modal queries.
We introduce the Query-Mixup strategy, which blends query features from different modalities during training.
We further enhance this flexibility by allowing queries to influence sound separation positively or negatively, facilitating the retention or removal of specific sounds.
arXiv Detail & Related papers (2024-10-28T17:58:15Z) - Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - Creative Text-to-Audio Generation via Synthesizer Programming [1.1203110769488043]
We propose a text-to-audio generation method that leverages a virtual modular sound synthesizer with only 78 parameters.
Our method, CTAG, iteratively updates a synthesizer's parameters to produce high-quality audio renderings of text prompts.
arXiv Detail & Related papers (2024-06-01T04:08:31Z) - Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations.
We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z) - NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
Synthesis based on Frequency Modulation [38.00669627261736]
We propose NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer.
Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge.
arXiv Detail & Related papers (2023-05-22T09:46:10Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [19.13182347908491]
The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem.
We proposed a multi-modal deep-learning-based pipeline Sound2 Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem.
Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
arXiv Detail & Related papers (2022-05-06T06:55:29Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.