Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models
- URL: http://arxiv.org/abs/2212.00793v2
- Date: Thu, 20 Apr 2023 15:03:54 GMT
- Title: Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models
- Authors: Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara and Vishal M.
Patel
- Abstract summary: We propose a solution based on denoising diffusion probabilistic models (DDPMs)
Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models.
Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
- Score: 54.1843419649895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating photos satisfying multiple constraints find broad utility in the
content creation industry. A key hurdle to accomplishing this task is the need
for paired data consisting of all modalities (i.e., constraints) and their
corresponding output. Moreover, existing methods need retraining using paired
data across all modalities to introduce a new condition. This paper proposes a
solution to this problem based on denoising diffusion probabilistic models
(DDPMs). Our motivation for choosing diffusion models over other generative
models comes from the flexible internal structure of diffusion models. Since
each sampling step in the DDPM follows a Gaussian distribution, we show that
there exists a closed-form solution for generating an image given various
constraints. Our method can unite multiple diffusion models trained on multiple
sub-tasks and conquer the combined task through our proposed sampling strategy.
We also introduce a novel reliability parameter that allows using different
off-the-shelf diffusion models trained across various datasets during sampling
time alone to guide it to the desired outcome satisfying multiple constraints.
We perform experiments on various standard multimodal tasks to demonstrate the
effectiveness of our approach. More details can be found in
https://nithin-gk.github.io/projectpages/Multidiff/index.html
Related papers
- TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [36.590918776922905]
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model.
Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model.
arXiv Detail & Related papers (2023-03-12T03:38:39Z) - Where to Diffuse, How to Diffuse, and How to Get Back: Automated
Learning for Multivariate Diffusions [22.04182099405728]
Diffusion-based generative models (DBGMs) perturb data to a target noise distribution and reverse this inference diffusion process to generate samples.
We show how to maximize a lower-bound on the likelihood for any number of auxiliary variables.
We then demonstrate how to parameterize the diffusion for a specified target noise distribution.
arXiv Detail & Related papers (2023-02-14T18:57:04Z) - From Points to Functions: Infinite-dimensional Representations in
Diffusion Models [23.916417852496608]
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution.
We show that a combination of information content from different time steps gives a strictly better representation for the downstream task.
arXiv Detail & Related papers (2022-10-25T05:30:53Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.