Composer Style-specific Symbolic Music Generation Using Vector Quantized
Discrete Diffusion Models
- URL: http://arxiv.org/abs/2310.14044v1
- Date: Sat, 21 Oct 2023 15:41:50 GMT
- Title: Composer Style-specific Symbolic Music Generation Using Vector Quantized
Discrete Diffusion Models
- Authors: Jincheng Zhang, Jingjing Tang, Charalampos Saitis, Gy\"orgy Fazekas
- Abstract summary: We propose to combine a vector quantized variational autoencoder (VQ-VAE) and discrete diffusion models for the generation of symbolic music.
The trained VQ-VAE can represent symbolic music as a sequence of indexes that correspond to specific entries in a learned codebook.
The results demonstrate our model can generate symbolic music with target composer styles that meet the given conditions with a high accuracy of 72.36%.
- Score: 2.8372820007098403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emerging Denoising Diffusion Probabilistic Models (DDPM) have become
increasingly utilised because of promising results they have achieved in
diverse generative tasks with continuous data, such as image and sound
synthesis. Nonetheless, the success of diffusion models has not been fully
extended to discrete symbolic music. We propose to combine a vector quantized
variational autoencoder (VQ-VAE) and discrete diffusion models for the
generation of symbolic music with desired composer styles. The trained VQ-VAE
can represent symbolic music as a sequence of indexes that correspond to
specific entries in a learned codebook. Subsequently, a discrete diffusion
model is used to model the VQ-VAE's discrete latent space. The diffusion model
is trained to generate intermediate music sequences consisting of codebook
indexes, which are then decoded to symbolic music using the VQ-VAE's decoder.
The results demonstrate our model can generate symbolic music with target
composer styles that meet the given conditions with a high accuracy of 72.36%.
Related papers
- An Independence-promoting Loss for Music Generation with Language Models [64.95095558672996]
Music generation schemes rely on a vocabulary of audio tokens, generally provided as codes in a discrete latent space learnt by an auto-encoder.
We introduce an independence-promoting loss to regularize the auto-encoder used as the tokenizer in language models for music generation.
arXiv Detail & Related papers (2024-06-04T13:44:39Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [73.47607237309258]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Fast Diffusion GAN Model for Symbolic Music Generation Controlled by
Emotions [1.6004393678882072]
We propose a diffusion model combined with a Generative Adversarial Network to generate discrete symbolic music.
We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels.
Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion.
arXiv Detail & Related papers (2023-10-21T15:35:43Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Generating symbolic music using diffusion models [0.0]
A diffusion model that uses a binomial prior distribution to generate piano rolls is proposed.
The generated music has coherence at time scales up to the length of the training piano roll segments.
The code is publicly shared to encourage the use and development of the method by the community.
arXiv Detail & Related papers (2023-03-15T06:01:02Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models.
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z) - Generative Modelling for Controllable Audio Synthesis of Expressive
Piano Performance [6.531546527140474]
controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE)
We demonstrate how the model is able to apply fine-grained style morphing over the course of the audio.
arXiv Detail & Related papers (2020-06-16T12:54:41Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.