Music SketchNet: Controllable Music Generation via Factorized
Representations of Pitch and Rhythm
- URL: http://arxiv.org/abs/2008.01291v1
- Date: Tue, 4 Aug 2020 02:49:57 GMT
- Title: Music SketchNet: Controllable Music Generation via Factorized
Representations of Pitch and Rhythm
- Authors: Ke Chen, Cheng-i Wang, Taylor Berg-Kirkpatrick, Shlomo Dubnov
- Abstract summary: Music SketchNet is a neural network framework that allows users to specify partial musical ideas guiding automatic music generation.
We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context.
We demonstrate that our model can successfully incorporate user-specified snippets during the generation process.
- Score: 42.694266687511906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Drawing an analogy with automatic image completion systems, we propose Music
SketchNet, a neural network framework that allows users to specify partial
musical ideas guiding automatic music generation. We focus on generating the
missing measures in incomplete monophonic musical pieces, conditioned on
surrounding context, and optionally guided by user-specified pitch and rhythm
snippets. First, we introduce SketchVAE, a novel variational autoencoder that
explicitly factorizes rhythm and pitch contour to form the basis of our
proposed model. Then we introduce two discriminative architectures,
SketchInpainter and SketchConnector, that in conjunction perform the guided
music completion, filling in representations for the missing measures
conditioned on surrounding context and user-specified snippets. We evaluate
SketchNet on a standard dataset of Irish folk music and compare with models
from recent works. When used for music completion, our approach outperforms the
state-of-the-art both in terms of objective metrics and subjective listening
tests. Finally, we demonstrate that our model can successfully incorporate
user-specified snippets during the generation process.
Related papers
- MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Combinatorial music generation model with song structure graph analysis [18.71152526968065]
We construct a graph that uses information such as note sequence and instrument as node features, while the correlation between note sequences acts as the edge feature.
We trained a Graph Neural Network to obtain node representation in the graph, then we use node representation as input of Unet to generate CONLON pianoroll image latent.
arXiv Detail & Related papers (2023-12-24T04:09:30Z) - Graph-based Polyphonic Multitrack Music Generation [9.701208207491879]
This paper introduces a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately.
By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times.
arXiv Detail & Related papers (2023-07-27T15:18:50Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Tr\"aumerAI: Dreaming Music with StyleGAN [2.578242050187029]
We propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN.
An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the StyleGAN-generated examples.
The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.
arXiv Detail & Related papers (2021-02-09T07:04:22Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.