Fast Diffusion GAN Model for Symbolic Music Generation Controlled by
Emotions
- URL: http://arxiv.org/abs/2310.14040v1
- Date: Sat, 21 Oct 2023 15:35:43 GMT
- Title: Fast Diffusion GAN Model for Symbolic Music Generation Controlled by
Emotions
- Authors: Jincheng Zhang, Gy\"orgy Fazekas, Charalampos Saitis
- Abstract summary: We propose a diffusion model combined with a Generative Adversarial Network to generate discrete symbolic music.
We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels.
Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion.
- Score: 1.6004393678882072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have shown promising results for a wide range of generative
tasks with continuous data, such as image and audio synthesis. However, little
progress has been made on using diffusion models to generate discrete symbolic
music because this new class of generative models are not well suited for
discrete data while its iterative sampling process is computationally
expensive. In this work, we propose a diffusion model combined with a
Generative Adversarial Network, aiming to (i) alleviate one of the remaining
challenges in algorithmic music generation which is the control of generation
towards a target emotion, and (ii) mitigate the slow sampling drawback of
diffusion models applied to symbolic music generation. We first used a trained
Variational Autoencoder to obtain embeddings of a symbolic music dataset with
emotion labels and then used those to train a diffusion model. Our results
demonstrate the successful control of our diffusion model to generate symbolic
music with a desired emotion. Our model achieves several orders of magnitude
improvement in computational cost, requiring merely four time steps to denoise
while the steps required by current state-of-the-art diffusion models for
symbolic music generation is in the order of thousands.
Related papers
- Music Consistency Models [31.415900049111023]
We present Music Consistency Models (textttMusicCM), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips.
Building upon existing text-to-music diffusion models, the textttMusicCM model incorporates consistency distillation and adversarial discriminator training.
Experimental results reveal the effectiveness of our model in terms of computational efficiency, fidelity, and naturalness.
arXiv Detail & Related papers (2024-04-20T11:52:30Z) - Neural Network Parameter Diffusion [50.85251415173792]
Diffusion models have achieved remarkable success in image and video generation.
In this work, we demonstrate that diffusion models can also.
generate high-performing neural network parameters.
arXiv Detail & Related papers (2024-02-20T16:59:03Z) - Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models [5.083504224028769]
We propose to combine a vector quantized variational autoencoder (VQ-VAE) and discrete diffusion models for the generation of symbolic music.
The trained VQ-VAE can represent symbolic music as a sequence of indexes that correspond to specific entries in a learned codebook.
The diffusion model is trained to generate intermediate music sequences consisting of codebook indexes, which are then decoded to symbolic music using the VQ-VAE's decoder.
arXiv Detail & Related papers (2023-10-21T15:41:50Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation [89.50310360658791]
We present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation.
This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model.
We demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music.
arXiv Detail & Related papers (2023-08-05T16:18:57Z) - Progressive distillation diffusion for raw music generation [0.0]
This paper aims to apply a new deep learning approach to the task of generating raw audio files.
It is based on diffusion models, a recent type of deep generative model.
arXiv Detail & Related papers (2023-07-20T16:25:00Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - Generating symbolic music using diffusion models [0.0]
A diffusion model that uses a binomial prior distribution to generate piano rolls is proposed.
The generated music has coherence at time scales up to the length of the training piano roll segments.
The code is publicly shared to encourage the use and development of the method by the community.
arXiv Detail & Related papers (2023-03-15T06:01:02Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z) - Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models.
A major drawback of this method is that it requires hundreds of iterations to produce a competitive result.
Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z) - Symbolic Music Generation with Diffusion Models [4.817429789586127]
We present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder.
We show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.
arXiv Detail & Related papers (2021-03-30T05:48:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.