Related papers: Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

URL: http://arxiv.org/abs/2511.07156v1
Date: Mon, 10 Nov 2025 14:46:10 GMT
Title: Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation
Authors: Matteo Pettenó, Alessandro Ilic Mezza, Alberto Bernardini,
Abstract summary: We explore the application of denoising diffusion processes as plug-and-play latent constraints for symbolic music generation models.<n>We show that diffusion-driven constraints outperform traditional attribute regularization and other latent constraints architectures.
Score: 47.38557855930304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in latent diffusion models have demonstrated state-of-the-art performance in high-dimensional time-series data synthesis while providing flexible control through conditioning and guidance. However, existing methodologies primarily rely on musical context or natural language as the main modality of interacting with the generative process, which may not be ideal for expert users who seek precise fader-like control over specific musical attributes. In this work, we explore the application of denoising diffusion processes as plug-and-play latent constraints for unconditional symbolic music generation models. We focus on a framework that leverages a library of small conditional diffusion models operating as implicit probabilistic priors on the latents of a frozen unconditional backbone. While previous studies have explored domain-specific use cases, this work, to the best of our knowledge, is the first to demonstrate the versatility of such an approach across a diverse array of musical attributes, such as note density, pitch range, contour, and rhythm complexity. Our experiments show that diffusion-driven constraints outperform traditional attribute regularization and other latent constraints architectures, achieving significantly stronger correlations between target and generated attributes while maintaining high perceptual quality and diversity.

Related papers

Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Constrained Discrete Diffusion [61.81569616239755]
This paper introduces Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process.<n>CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach.
arXiv Detail & Related papers (2025-03-12T19:48:12Z)
Understanding the Quality-Diversity Trade-off in Diffusion Language Models [0.0]
Diffusion models can be used to model continuous data across a range of domains such as vision and audio.<n>Recent work explores their application to text generation by working in the continuous embedding space.<n>Models lack a natural means to control the inherent trade-off between quality and diversity.
arXiv Detail & Related papers (2025-03-11T17:18:01Z)
Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.<n>We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.<n>The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z)
An AI-powered Bayesian generative modeling approach for causal inference in observational studies [4.4876925770439415]
CausalBGM is an AI-powered Bayesian generative modeling approach.<n>It estimates the individual treatment effect (ITE) by learning individual-specific distributions of a low-dimensional latent feature set.
arXiv Detail & Related papers (2025-01-01T06:52:45Z)
Generalized Diffusion Model with Adjusted Offset Noise [1.7767466724342067]
We propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework.<n>We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments.<n>Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.
arXiv Detail & Related papers (2024-12-04T08:57:03Z)
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z)
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning [71.24316734338501]
We propose an effective temporally-conditional diffusion model coined Temporally-Composable diffuser (TCD) TCD extracts temporal information from interaction sequences and explicitly guides generation with temporal conditions. Our method reaches or matches the best performance compared with prior SOTA baselines.
arXiv Detail & Related papers (2023-06-08T02:12:26Z)
Symbolic Music Generation with Diffusion Models [4.817429789586127]
We present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. We show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.
arXiv Detail & Related papers (2021-03-30T05:48:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.