Diffusion Models with Double Guidance: Generate with aggregated datasets
- URL: http://arxiv.org/abs/2505.13213v1
- Date: Mon, 19 May 2025 14:59:35 GMT
- Title: Diffusion Models with Double Guidance: Generate with aggregated datasets
- Authors: Yanfeng Yang, Kenji Fukumizu,
- Abstract summary: Large-scale datasets for training high-performance generative models are often prohibitively expensive, especially when associated attributes or annotations must be provided.<n>This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions.<n>We propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously.
- Score: 18.0878149546412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a common strategy. However, the sets of attributes across datasets are often inconsistent, and their naive concatenation typically leads to block-wise missing conditions. This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions, thereby limiting the model's controllability and applicability. To address this issue, we propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously. Our method maintains rigorous control over multiple conditions without requiring joint annotations. We demonstrate its effectiveness in molecular and image generation tasks, where it outperforms existing baselines both in alignment with target conditional distributions and in controllability under missing condition settings.
Related papers
- LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation [34.39449499558055]
Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design.<n>We propose to carry out layout generation through retrieving by conditions and reference-guided generation.<n>Our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models.
arXiv Detail & Related papers (2025-06-03T09:47:03Z) - RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models [83.6013616017646]
RelDiff is a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure.<n>RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases.
arXiv Detail & Related papers (2025-05-31T21:01:02Z) - Masked Conditioning for Deep Generative Models [0.0]
We introduce a novel masked-conditioning approach that enables generative models to work with sparse, mixed-type data.<n>We show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality.
arXiv Detail & Related papers (2025-05-22T14:33:03Z) - Bridging the inference gap in Mutimodal Variational Autoencoders [6.246098300155483]
Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved modalities from observed ones.<n>Recent models using mixturesof-experts aggregation suffer from theoretically grounded limitations that restrict their generation quality on complex datasets.<n>We propose a novel interpretable model able to learn both joint and conditional distributions without introducing mixture aggregation.
arXiv Detail & Related papers (2025-02-06T10:43:55Z) - Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules [4.710921988115686]
We investigate whether modern generative models can learn underlying rules from finite samples and perform reasoning through conditional sampling.
Inspired by Raven's Progressive Matrices task, we designed GenRAVEN dataset, where each sample consists of three rows.
We trained generative models to learn the data distribution, where samples are encoded as integer arrays to focus on rule learning.
arXiv Detail & Related papers (2024-11-12T15:29:50Z) - Generating the Traces You Need: A Conditional Generative Model for Process Mining Data [10.914597458295248]
We introduce a conditional model for process data generation based on a conditional variational autoencoder (CVAE)
CVAE for process mining faces specific challenges due to the multiperspective nature of the data and the need to adhere to control-flow rules.
arXiv Detail & Related papers (2024-11-04T14:44:20Z) - Zero-Shot Conditioning of Score-Based Diffusion Models by Neuro-Symbolic Constraints [1.1826485120701153]
We propose a method that, given a pre-trained unconditional score-based generative model, samples from the conditional distribution under arbitrary logical constraints.<n>We show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint.<n>We define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints.
arXiv Detail & Related papers (2023-08-31T08:25:47Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Maximum Likelihood on the Joint (Data, Condition) Distribution for
Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood.
I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Conditional Hybrid GAN for Sequence Generation [56.67961004064029]
We propose a novel conditional hybrid GAN (C-Hybrid-GAN) to solve this issue.
We exploit the Gumbel-Softmax technique to approximate the distribution of discrete-valued sequences.
We demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in context-conditioned discrete-valued sequence generation.
arXiv Detail & Related papers (2020-09-18T03:52:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.