Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
- URL: http://arxiv.org/abs/2403.11706v1
- Date: Mon, 18 Mar 2024 12:08:01 GMT
- Title: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
- Authors: Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele RodolĂ ,
- Abstract summary: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks.
This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings.
We propose an inference procedure enabling the coherent generation of sources and accompaniments.
- Score: 26.373204974010086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. These models do not require separated data as they are trained on mixtures, can parameterize an arbitrary number of sources, and allow for rich semantic control. We propose an inference procedure enabling the coherent generation of sources and accompaniments. Additionally, we adapt the Dirac separator of MSDM to perform source separation. We experiment with diffusion models trained on Slakh2100 and MTG-Jamendo, showcasing competitive generation and separation results in a relaxed data setting.
Related papers
- Multi-Source Music Generation with Latent Diffusion [7.832209959041259]
Multi-Source Diffusion Model (MSDM) proposed to model music as a mixture of multiple instrumental sources.
MSLDM employs Variational Autoencoders (VAEs) to encode each instrumental source into a distinct latent representation.
This approach significantly enhances the total and partial generation of music.
arXiv Detail & Related papers (2024-09-10T03:41:10Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Training Data Protection with Compositional Diffusion Models [99.46239561159953]
Compartmentalized Diffusion Models (CDM) are a method to train different diffusion models (or prompts) on distinct data sources.
Individual models can be trained in isolation, at different times, and on different distributions and domains.
Each model only contains information about a subset of the data it was exposed to during training, enabling several forms of training data protection.
arXiv Detail & Related papers (2023-08-02T23:27:49Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [17.124189082882395]
We train our model on Slakh2100, a standard dataset for musical source separation.
Our method is the first example of a single model that can handle both generation and separation tasks.
arXiv Detail & Related papers (2023-02-04T23:18:36Z) - Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs)
Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models.
Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z) - Diffusion-based Generative Speech Source Separation [27.928990101986862]
We propose DiffSep, a new single channel source separation method based on score-matching of a differential equation (SDE)
Experiments on the WSJ0 2mix dataset demonstrate the potential of the method.
The method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.
arXiv Detail & Related papers (2022-10-31T13:46:55Z) - Source Separation with Deep Generative Priors [17.665938343060112]
We use generative models as priors over the components of a mixture of sources, and noise-annealed Langevin dynamics to sample from the posterior distribution of sources given a mixture.
This decouples the source separation problem from generative modeling, enabling us to directly use cutting-edge generative models as priors.
The method achieves state-of-the-art performance for MNIST digit separation.
arXiv Detail & Related papers (2020-02-19T00:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.