CycleDRUMS: Automatic Drum Arrangement For Bass Lines Using CycleGAN
- URL: http://arxiv.org/abs/2104.00353v1
- Date: Thu, 1 Apr 2021 09:17:48 GMT
- Title: CycleDRUMS: Automatic Drum Arrangement For Bass Lines Using CycleGAN
- Authors: Giorgio Barnab\`o, Giovanni Trappolini, Lorenzo Lastilla, Cesare
Campagnano, Angela Fan, Fabio Petroni and Fabrizio Silvestri
- Abstract summary: CycleDRUMS is a novel method for generating drums given a bass line.
After converting the waveform of the bass into a mel-spectrogram, we are able to automatically generate original drums that follow the beat.
- Score: 12.93891163150604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The two main research threads in computer-based music generation are: the
construction of autonomous music-making systems, and the design of
computer-based environments to assist musicians. In the symbolic domain, the
key problem of automatically arranging a piece music was extensively studied,
while relatively fewer systems tackled this challenge in the audio domain. In
this contribution, we propose CycleDRUMS, a novel method for generating drums
given a bass line. After converting the waveform of the bass into a
mel-spectrogram, we are able to automatically generate original drums that
follow the beat, sound credible and can be directly mixed with the input bass.
We formulated this task as an unpaired image-to-image translation problem, and
we addressed it with CycleGAN, a well-established unsupervised style transfer
framework, originally designed for treating images. The choice to deploy raw
audio and mel-spectrograms enabled us to better represent how humans perceive
music, and to potentially draw sounds for new arrangements from the vast
collection of music recordings accumulated in the last century. In absence of
an objective way of evaluating the output of both generative adversarial
networks and music generative systems, we further defined a possible metric for
the proposed task, partially based on human (and expert) judgement. Finally, as
a comparison, we replicated our results with Pix2Pix, a paired image-to-image
translation network, and we showed that our approach outperforms it.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Deep Performer: Score-to-Audio Music Performance Synthesis [30.95307878579825]
Deep Performer is a novel system for score-to-audio music performance synthesis.
Unlike speech, music often contains polyphony and long notes.
We show that our proposed model can synthesize music with clear polyphony and harmonic structures.
arXiv Detail & Related papers (2022-02-12T10:36:52Z) - Conditional Drums Generation using Compound Word Representations [4.435094091999926]
We tackle the task of conditional drums generation using a novel data encoding scheme inspired by Compound Word representation.
We present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) receives information about the conditioning parameters.
A Transformer-based Decoder with relative global attention produces the generated drum sequences.
arXiv Detail & Related papers (2022-02-09T13:49:27Z) - LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical
Parameters [12.72202888016628]
LoopNet is a feed-forward generative model for creating loops conditioned on intuitive parameters.
We leverage Music Information Retrieval (MIR) models as well as a large collection of public loop samples in our study.
arXiv Detail & Related papers (2021-05-21T14:24:34Z) - Multi-Instrumentalist Net: Unsupervised Generation of Music from Body
Movements [20.627164135805852]
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting.
We build a pipeline named 'Multi-instrumentalistNet' that learns a discrete latent representation of various instruments music from log-spectrogram.
We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video.
arXiv Detail & Related papers (2020-12-07T06:54:10Z) - Music SketchNet: Controllable Music Generation via Factorized
Representations of Pitch and Rhythm [42.694266687511906]
Music SketchNet is a neural network framework that allows users to specify partial musical ideas guiding automatic music generation.
We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context.
We demonstrate that our model can successfully incorporate user-specified snippets during the generation process.
arXiv Detail & Related papers (2020-08-04T02:49:57Z) - Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
Source Separation [96.18178553315472]
We propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio.
We integrate both stereo generation and source separation into a unified framework, Sep-Stereo.
arXiv Detail & Related papers (2020-07-20T06:20:26Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.