Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models
- URL: http://arxiv.org/abs/2305.18433v1
- Date: Sun, 28 May 2023 23:54:52 GMT
- Title: Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models
- Authors: Zizhao Hu, Mohammad Rostami
- Abstract summary: Cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities.
We explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain.
- Score: 12.013345715187285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing cross-modal generative methods based on diffusion models use
guidance to provide control over the latent space to enable conditional
generation across different modalities. Such methods focus on providing
guidance through separately-trained models, each for one modality. As a result,
these methods suffer from cross-modal information loss and are limited to
unidirectional conditional generation. Inspired by how humans synchronously
acquire multi-modal information and learn the correlation between modalities,
we explore a multi-modal diffusion model training and sampling scheme that uses
channel-wise image conditioning to learn cross-modality correlation during the
training phase to better mimic the learning process in the brain. Our empirical
results demonstrate that our approach can achieve data generation conditioned
on all correlated modalities.
Related papers
- Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning [66.28872204574648]
Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information.
Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality.
This paper explores a new way to take advantage of cross-modal guidance without gold labels on coherency.
arXiv Detail & Related papers (2024-08-01T06:04:44Z) - Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization [14.606035444283984]
Current approaches focus on developing models that handle modality-incomplete inputs during inference.
We propose a robust universal model with modality reconstruction and model personalization.
Our method has been extensively validated on two brain tumor segmentation benchmarks.
arXiv Detail & Related papers (2024-06-04T06:07:24Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multi-modal Latent Diffusion [8.316365279740188]
Multi-modal Variational Autoencoders are a popular family of models that aim to learn a joint representation of the different modalities.
Existing approaches suffer from a coherence-quality tradeoff, where models with good generation quality lack generative coherence across modalities.
We propose a novel method that uses a set of independently trained, uni-modal, deterministic autoencoders.
arXiv Detail & Related papers (2023-06-07T14:16:44Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Learning Data Representations with Joint Diffusion Models [20.25147743706431]
Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train.
We extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives.
The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks.
arXiv Detail & Related papers (2023-01-31T13:29:19Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Discriminative Multimodal Learning via Conditional Priors in Generative
Models [21.166519800652047]
This research studies the realistic scenario in which all modalities and class labels are available for model training.
We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities.
arXiv Detail & Related papers (2021-10-09T17:22:24Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.