DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With
Autoencoding Generative Adversarial Networks
- URL: http://arxiv.org/abs/2206.14723v1
- Date: Wed, 29 Jun 2022 15:44:19 GMT
- Title: DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With
Autoencoding Generative Adversarial Networks
- Authors: Javier Nistal, Cyran Aouameur, Ithan Velarde, and Stefan Lattner
- Abstract summary: We present DrumGAN VST, a plugin for synthesizing drum sounds using a Generative Adrial Network.
DrumGAN VST operates on 44.1 kHz sample-rate audio, offers independent and continuous instrument class controls, and features an encoding neural network that maps sounds into the GAN's latent space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In contemporary popular music production, drum sound design is commonly
performed by cumbersome browsing and processing of pre-recorded samples in
sound libraries. One can also use specialized synthesis hardware, typically
controlled through low-level, musically meaningless parameters. Today, the
field of Deep Learning offers methods to control the synthesis process via
learned high-level features and allows generating a wide variety of sounds. In
this paper, we present DrumGAN VST, a plugin for synthesizing drum sounds using
a Generative Adversarial Network. DrumGAN VST operates on 44.1 kHz sample-rate
audio, offers independent and continuous instrument class controls, and
features an encoding neural network that maps sounds into the GAN's latent
space, enabling resynthesis and manipulation of pre-existing drum sounds. We
provide numerous sound examples and a demo of the proposed VST plugin.
Related papers
- Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement [0.0]
CoSaRef is a MIDI-to-audio synthesis method that can be developed without MIDI-audio paired datasets.
It first performs concatenative synthesis based on MIDI inputs and then refines the resulting audio into realistic tracks using a diffusion-based deep generative model trained on audio-only datasets.
arXiv Detail & Related papers (2024-10-22T08:01:40Z) - Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - Toward Deep Drum Source Separation [52.01259769265708]
We introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems.
Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date.
We leverage StemGMD to develop LarsNet, a novel deep drum source separation model.
arXiv Detail & Related papers (2023-12-15T10:23:07Z) - SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and
Exploration [21.473019531697062]
We implement a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level.
We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound.
arXiv Detail & Related papers (2023-12-07T20:40:36Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - SerumRNN: Step by Step Audio VST Effect Programming [18.35125491671331]
SerumRNN is a system that provides step-by-step instructions for applying audio effects to change a user's input audio towards a desired sound.
Our results indicate that SerumRNN is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets.
arXiv Detail & Related papers (2021-04-08T16:32:14Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.