Symbolic Music Generation with Diffusion Models
- URL: http://arxiv.org/abs/2103.16091v1
- Date: Tue, 30 Mar 2021 05:48:05 GMT
- Title: Symbolic Music Generation with Diffusion Models
- Authors: Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon
- Abstract summary: We present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder.
We show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.
- Score: 4.817429789586127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score-based generative models and diffusion probabilistic models have been
successful at generating high-quality samples in continuous domains such as
images and audio. However, due to their Langevin-inspired sampling mechanisms,
their application to discrete and sequential data has been limited. In this
work, we present a technique for training diffusion models on sequential data
by parameterizing the discrete domain in the continuous latent space of a
pre-trained variational autoencoder. Our method is non-autoregressive and
learns to generate sequences of latent embeddings through the reverse process
and offers parallel generation with a constant number of iterative refinement
steps. We apply this technique to modeling symbolic music and show strong
unconditional generation and post-hoc conditional infilling results compared to
autoregressive language models operating over the same continuous embeddings.
Related papers
- Discrete Modeling via Boundary Conditional Diffusion Processes [29.95155303262501]
Previous approaches have suffered from the discrepancy between discrete data and continuous modeling.
We propose a two-step forward process that first estimates the boundary as a prior distribution.
We then rescales the forward trajectory to construct a boundary conditional diffusion model.
arXiv Detail & Related papers (2024-10-29T09:42:42Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - Discrete Diffusion Language Model for Long Text Summarization [19.267738861590487]
We introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively.
Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv.
arXiv Detail & Related papers (2024-06-25T09:55:22Z) - Fast Sampling via Discrete Non-Markov Diffusion Models [49.598085130313514]
We propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation.
Our method significantly reduces the number of function evaluations (i.e., calls to the neural network), making the sampling process much faster.
arXiv Detail & Related papers (2023-12-14T18:14:11Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Latent Dynamical Implicit Diffusion Processes [0.0]
We propose a novel latent variable model named latent dynamical implicit diffusion processes (LDIDPs)
LDIDPs utilize implicit diffusion processes to sample from dynamical latent processes and generate sequential observation samples accordingly.
We demonstrate that LDIDPs can accurately learn the dynamics over latent dimensions.
arXiv Detail & Related papers (2023-06-12T12:43:27Z) - Diffusion Glancing Transformer for Parallel Sequence to Sequence
Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling.
DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.