Related papers: Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

URL: http://arxiv.org/abs/2212.00793v2
Date: Thu, 20 Apr 2023 15:03:54 GMT
Title: Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Authors: Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara and Vishal M. Patel
Abstract summary: We propose a solution based on denoising diffusion probabilistic models (DDPMs) Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
Score: 54.1843419649895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

Related papers

Bridging the inference gap in Mutimodal Variational Autoencoders [6.246098300155483]
Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved modalities from observed ones. Recent models using mixturesof-experts aggregation suffer from theoretically grounded limitations that restrict their generation quality on complex datasets. We propose a novel interpretable model able to learn both joint and conditional distributions without introducing mixture aggregation.
arXiv Detail & Related papers (2025-02-06T10:43:55Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes. For many applications such as image editing, the model input comes from a distribution that is not random noise. In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z)
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model [56.03464169048182]
Existing text-video retrieval solutions focus on maximizing the conditional likelihood, i.e., p(candidates|query) We creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query) This is accomplished through a diffusion-based text-video retrieval framework (DiffusionRet), which models the retrieval task as a process of gradually generating joint distribution from noise.
arXiv Detail & Related papers (2023-03-17T10:07:19Z)
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [36.590918776922905]
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model.
arXiv Detail & Related papers (2023-03-12T03:38:39Z)
Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions [22.04182099405728]
Diffusion-based generative models (DBGMs) perturb data to a target noise distribution and reverse this inference diffusion process to generate samples. We show how to maximize a lower-bound on the likelihood for any number of auxiliary variables. We then demonstrate how to parameterize the diffusion for a specified target noise distribution.
arXiv Detail & Related papers (2023-02-14T18:57:04Z)
From Points to Functions: Infinite-dimensional Representations in Diffusion Models [23.916417852496608]
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution. We show that a combination of information content from different time steps gives a strictly better representation for the downstream task.
arXiv Detail & Related papers (2022-10-25T05:30:53Z)
Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z)
Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs. We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.