Enhancing Diffusion-Based Quantitatively Controllable Image Generation via Matrix-Form EDM and Adaptive Vicinal Training
- URL: http://arxiv.org/abs/2602.02114v1
- Date: Mon, 02 Feb 2026 13:55:49 GMT
- Title: Enhancing Diffusion-Based Quantitatively Controllable Image Generation via Matrix-Form EDM and Adaptive Vicinal Training
- Authors: Xin Ding, Yun Chen, Sen Zhang, Kao Zhang, Nenglun Chen, Peibei Cao, Yongwei Wang, Fei Wu,
- Abstract summary: Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels.<n>We propose iCCDM, which incorporates the more advanced textitElucidated Diffusion Model (EDM) framework with substantial modifications to improve both generation quality and sampling efficiency.<n>Experiments on four benchmark datasets, spanning image resolutions from $64times64$ to $256times256$, demonstrate that iCCDM consistently outperforms existing methods.
- Score: 22.721395122355187
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels. Although CCDM has demonstrated clear advantages over prior approaches across a range of datasets, it still exhibits notable limitations and has recently been surpassed by a GAN-based method, namely CcGAN-AVAR. These limitations mainly arise from its reliance on an outdated diffusion framework and its low sampling efficiency due to long sampling trajectories. To address these issues, we propose an improved CCDM framework, termed iCCDM, which incorporates the more advanced \textit{Elucidated Diffusion Model} (EDM) framework with substantial modifications to improve both generation quality and sampling efficiency. Specifically, iCCDM introduces a novel matrix-form EDM formulation together with an adaptive vicinal training strategy. Extensive experiments on four benchmark datasets, spanning image resolutions from $64\times64$ to $256\times256$, demonstrate that iCCDM consistently outperforms existing methods, including state-of-the-art large-scale text-to-image diffusion models (e.g., Stable Diffusion 3, FLUX.1, and Qwen-Image), achieving higher generation quality while significantly reducing sampling cost.
Related papers
- Imbalance-Robust and Sampling-Efficient Continuous Conditional GANs via Adaptive Vicinity and Auxiliary Regularization [15.273709585153009]
We propose an enhanced CcGAN framework featuring two novel components for handling data imbalance.<n>The CcGAN framework's native one-step generator enables 30x-2000x faster inference than CCDM.
arXiv Detail & Related papers (2025-08-03T11:36:00Z) - Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling [48.96034602889216]
Variencoding Discrete Diffusion (VADD) is a novel framework that enhances discrete diffusion with latent variable modeling.<n>By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds and amortized inference over the training set.<n> Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.
arXiv Detail & Related papers (2025-05-23T01:45:47Z) - Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion [55.185588994883226]
We introduce VQ-LCMD, a continuous-space latent diffusion framework within the embedding space that stabilizes training.<n>VQ-LCMD uses a novel training objective combining the joint embedding-diffusion variational lower bound with a consistency-matching (CM) loss.<n>Experiments show that the proposed VQ-LCMD yields superior results on FFHQ, LSUN Churches, and LSUN Bedrooms compared to discrete-state latent diffusion models.
arXiv Detail & Related papers (2024-10-18T09:12:33Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation [60.61972883059688]
CriDiff is a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation.
To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS.
The GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters.
arXiv Detail & Related papers (2024-06-20T10:46:50Z) - CCDM: Continuous Conditional Diffusion Models for Image Generation [22.70942688582302]
Conditional Diffusion Models (CDMs) offer a promising alternative to Continuous Conditional Generative Modeling (CCGM)<n>CDMs address existing limitations with specially designed conditional diffusion processes, a novel hard vicinal image denoising loss, and efficient conditional sampling procedures.<n>We demonstrate that CCDMs outperform state-of-the-art CCGM models, establishing a new benchmark.
arXiv Detail & Related papers (2024-05-06T15:10:19Z) - Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction [4.227116189483428]
This study introduces a novel Cascaded Diffusion with Discrepancy Mitigation framework.
It includes the low-quality image generation in latent space and the high-quality image generation in pixel space.
It minimizes computational costs by moving some inference steps from pixel space to latent space.
arXiv Detail & Related papers (2024-03-14T12:58:28Z) - TC-DiffRecon: Texture coordination MRI reconstruction method based on
diffusion model and modified MF-UNet method [2.626378252978696]
We propose a novel diffusion model-based MRI reconstruction method, named TC-DiffRecon, which does not rely on a specific acceleration factor for training.
We also suggest the incorporation of the MF-UNet module, designed to enhance the quality of MRI images generated by the model.
arXiv Detail & Related papers (2024-02-17T13:09:00Z) - LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models [54.93010869546011]
We propose to leverage the pre-trained latent diffusion model to perform the neural ISP for enhancing extremely low-light images.<n>Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules.<n>We observe different roles of UNet denoising and decoder reconstruction in the latent diffusion model, which inspires us to decompose the low-light image enhancement task into latent-space low-frequency content generation and decoding-phase high-frequency detail maintenance.
arXiv Detail & Related papers (2023-12-02T04:31:51Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Accelerating Score-based Generative Models with Preconditioned Diffusion
Sampling [36.02321871608158]
We propose a model-agnostic preconditioned diffusion sampling (PDS) method that leverages matrix preconditioning to alleviate the problem.
PDS consistently accelerates off-the-shelf SGMs whilst maintaining the synthesis quality.
In particular, PDS can accelerate by up to 29x on more challenging high resolution (1024x1024) image generation.
arXiv Detail & Related papers (2022-07-05T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.