Related papers: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

URL: http://arxiv.org/abs/2403.11706v1
Date: Mon, 18 Mar 2024 12:08:01 GMT
Title: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Authors: Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele Rodolà,
Abstract summary: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. We propose an inference procedure enabling the coherent generation of sources and accompaniments.
Score: 26.373204974010086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. These models do not require separated data as they are trained on mixtures, can parameterize an arbitrary number of sources, and allow for rich semantic control. We propose an inference procedure enabling the coherent generation of sources and accompaniments. Additionally, we adapt the Dirac separator of MSDM to perform source separation. We experiment with diffusion models trained on Slakh2100 and MTG-Jamendo, showcasing competitive generation and separation results in a relaxed data setting.

Related papers

Unified Multimodal Discrete Diffusion [78.48930545306654]
Multimodal generative models that can understand and generate across multiple modalities are dominated by autoregressive (AR) approaches. We explore discrete diffusion models as a unified generative formulation in the joint text and image domain. We present the first Unified Multimodal Discrete Diffusion (UniDisc) model which is capable of jointly understanding and generating text and images.
arXiv Detail & Related papers (2025-03-26T17:59:51Z)
A Theory for Conditional Generative Modeling on Multiple Data Sources [20.539424639329564]
This paper takes the first step toward a rigorous analysis of multi-source training in conditional generative modeling. Our result shows that when source distributions share certain similarities and the model is expressive enough, multi-source training guarantees a sharper bound than single-source training.
arXiv Detail & Related papers (2025-02-20T14:13:24Z)
Multi-Source Music Generation with Latent Diffusion [7.832209959041259]
Multi-Source Diffusion Model (MSDM) proposed to model music as a mixture of multiple instrumental sources. MSLDM employs Variational Autoencoders (VAEs) to encode each instrumental source into a distinct latent representation. This approach significantly enhances the total and partial generation of music.
arXiv Detail & Related papers (2024-09-10T03:41:10Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Training Data Protection with Compositional Diffusion Models [99.46239561159953]
Compartmentalized Diffusion Models (CDM) are a method to train different diffusion models (or prompts) on distinct data sources. Individual models can be trained in isolation, at different times, and on different distributions and domains. Each model only contains information about a subset of the data it was exposed to during training, enabling several forms of training data protection.
arXiv Detail & Related papers (2023-08-02T23:27:49Z)
Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data. We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models. We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z)
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [17.124189082882395]
We train our model on Slakh2100, a standard dataset for musical source separation. Our method is the first example of a single model that can handle both generation and separation tasks.
arXiv Detail & Related papers (2023-02-04T23:18:36Z)
Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs) Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z)
Diffusion-based Generative Speech Source Separation [27.928990101986862]
We propose DiffSep, a new single channel source separation method based on score-matching of a differential equation (SDE) Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. The method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.
arXiv Detail & Related papers (2022-10-31T13:46:55Z)
Source Separation with Deep Generative Priors [17.665938343060112]
We use generative models as priors over the components of a mixture of sources, and noise-annealed Langevin dynamics to sample from the posterior distribution of sources given a mixture. This decouples the source separation problem from generative modeling, enabling us to directly use cutting-edge generative models as priors. The method achieves state-of-the-art performance for MNIST digit separation.
arXiv Detail & Related papers (2020-02-19T00:48:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.