Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
- URL: http://arxiv.org/abs/2403.11706v1
- Date: Mon, 18 Mar 2024 12:08:01 GMT
- Title: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
- Authors: Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele RodolĂ ,
- Abstract summary: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks.
This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings.
We propose an inference procedure enabling the coherent generation of sources and accompaniments.
- Score: 26.373204974010086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. These models do not require separated data as they are trained on mixtures, can parameterize an arbitrary number of sources, and allow for rich semantic control. We propose an inference procedure enabling the coherent generation of sources and accompaniments. Additionally, we adapt the Dirac separator of MSDM to perform source separation. We experiment with diffusion models trained on Slakh2100 and MTG-Jamendo, showcasing competitive generation and separation results in a relaxed data setting.
Related papers
- Mitigating Biases with Diverse Ensembles and Diffusion Models [99.6100669122048]
We propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Training Data Protection with Compositional Diffusion Models [106.70782871834237]
Compartmentalized Diffusion Models (CDM) are a method to train different diffusion models (or prompts) on distinct data sources.
Individual models can be trained in isolation, at different times, and on different distributions and domains.
Each model only contains information about a subset of the data it was exposed to during training, enabling several forms of training data protection.
arXiv Detail & Related papers (2023-08-02T23:27:49Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [17.124189082882395]
We train our model on Slakh2100, a standard dataset for musical source separation.
Our method is the first example of a single model that can handle both generation and separation tasks.
arXiv Detail & Related papers (2023-02-04T23:18:36Z) - Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs)
Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models.
Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z) - Diffusion-based Generative Speech Source Separation [27.928990101986862]
We propose DiffSep, a new single channel source separation method based on score-matching of a differential equation (SDE)
Experiments on the WSJ0 2mix dataset demonstrate the potential of the method.
The method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.
arXiv Detail & Related papers (2022-10-31T13:46:55Z) - Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models [8.80867379881193]
We propose an unsupervised model-based deep learning approach to musical source separation.
A neural network is trained to reconstruct the observed mixture as a sum of the sources.
The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
arXiv Detail & Related papers (2022-01-24T11:05:30Z) - Source Separation with Deep Generative Priors [17.665938343060112]
We use generative models as priors over the components of a mixture of sources, and noise-annealed Langevin dynamics to sample from the posterior distribution of sources given a mixture.
This decouples the source separation problem from generative modeling, enabling us to directly use cutting-edge generative models as priors.
The method achieves state-of-the-art performance for MNIST digit separation.
arXiv Detail & Related papers (2020-02-19T00:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.