E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models
- URL: http://arxiv.org/abs/2412.21044v2
- Date: Mon, 10 Mar 2025 02:46:17 GMT
- Title: E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models
- Authors: Zhiyu Tan, WenXu Qian, Hesen Chen, Mengping Yang, Lei Chen, Hao Li,
- Abstract summary: Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling.<n>We present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises.<n>Our method achieves substantial performance gains in terms of Fr'eche't Inception Distance (FID) and CLIP score, even with fewer sampling steps.
- Score: 15.270657838960114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling, revolutionizing the field through remarkable success across various diverse applications ranging from high-quality image synthesis to temporal aware video generation. Despite these advancements, three fundamental limitations persist, including 1) discrepancy between training and inference processes, 2) progressive information leakage throughout the noise corruption procedures, and 3) inherent constraints preventing effective integration of modern optimization criteria like perceptual and adversarial loss. To mitigate these critical challenges, we in this paper present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises. Our proposed End-to-End Differentiable Diffusion, dubbed E2ED^2, introduces several key improvements: it eliminates the sequential training-sampling mismatch and intermediate information leakage via conceptualizing training as a direct transformation from isotropic Gaussian noise to the target data distribution. Additionally, such training framework enables seamless incorporation of adversarial and perceptual losses into the core optimization objective. Comprehensive evaluation across standard benchmarks including COCO30K and HW30K reveals that our method achieves substantial performance gains in terms of Fr\'echet Inception Distance (FID) and CLIP score, even with fewer sampling steps (less than 4). Our findings highlight that the end-to-end mechanism might pave the way for more robust and efficient solutions, \emph{i.e.,} combining diffusion stability with GAN-like discriminative optimization in an end-to-end manner.
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.
To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.
Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Aligning Few-Step Diffusion Models with Dense Reward Difference Learning [81.85515625591884]
Stepwise Diffusion Policy Optimization (SDPO) is an alignment method tailored for few-step diffusion models.
SDPO incorporates dense reward feedback at every intermediate step to ensure consistent alignment across all denoising steps.
SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations.
arXiv Detail & Related papers (2024-11-18T16:57:41Z) - FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization.
Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z) - Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models [20.550324116099357]
Diffusion models are known for their tremendous ability to generate novel and high-quality samples.
Recent approaches for memory mitigation either only focused on the text modality problem in cross-modal generation tasks or utilized data augmentation strategies.
We propose a novel training framework for diffusion models from the perspective of visual modality, which is more generic and fundamental for mitigating memorization.
arXiv Detail & Related papers (2024-07-22T02:19:30Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation [18.371344440413353]
We propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps.
Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality.
arXiv Detail & Related papers (2024-05-08T17:15:18Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - Two-Stage Triplet Loss Training with Curriculum Augmentation for
Audio-Visual Retrieval [3.164991885881342]
Cross- retrieval models learn robust embedding spaces.
We introduce a novel approach rooted in curriculum learning to address this problem.
We propose a two-stage training paradigm that guides the model's learning process from semi-hard to hard triplets.
arXiv Detail & Related papers (2023-10-20T12:35:54Z) - Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term.
We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms.
Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.