Self-Guidance: Boosting Flow and Diffusion Generation on Their Own
- URL: http://arxiv.org/abs/2412.05827v2
- Date: Sat, 08 Mar 2025 13:10:47 GMT
- Title: Self-Guidance: Boosting Flow and Diffusion Generation on Their Own
- Authors: Tiancheng Li, Weijian Luo, Zhiyang Chen, Liyuan Ma, Guo-Jun Qi,
- Abstract summary: We propose Self-Guidance, which improves the image quality by suppressing the generation of low-quality samples.<n>With open-sourced diffusion models such as Stable Diffusion 3.5 and FLUX, Self-Guidance surpasses existing algorithms on multiple metrics.<n>We find that SG has a surprisingly positive effect on the generation of physiologically correct human body structures.
- Score: 32.91402070439289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proper guidance strategies are essential to achieve high-quality generation results without retraining diffusion and flow-based text-to-image models. Existing guidance either requires specific training or strong inductive biases of diffusion model networks, potentially limiting their applications. Motivated by the observation that artifact outliers can be detected by a significant decline in the density from a noisier to a cleaner noise level, we propose Self-Guidance (SG), which improves the image quality by suppressing the generation of low-quality samples. SG only relies on the sampling probabilities of its own diffusion model at different noise levels with no need of any guidance-specific training. This makes it flexible to be used in a plug-and-play manner with other sampling algorithms, maximizing its potential to achieve competitive performances in many generative tasks. We conduct experiments on text-to-image and text-to-video generation with different architectures, including UNet and transformer models. With open-sourced diffusion models such as Stable Diffusion 3.5 and FLUX, Self-Guidance surpasses existing algorithms on multiple metrics, including both FID and Human Preference Score. Moreover, we find that SG has a surprisingly positive effect on the generation of physiologically correct human body structures such as hands, faces, and arms, showing its ability of eliminating human body artifacts with minimal efforts. We will release our code along with this paper.
Related papers
- DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech [42.663766380488205]
DIDiffGes can synthesize high-quality, expressive gestures from speech using only a few sampling steps.
Our method outperforms state-of-the-art approaches in human likeness, appropriateness, and style correctness.
arXiv Detail & Related papers (2025-03-21T11:23:39Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.
DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity [9.092404060771306]
Diffusion models have shown impressive results in generating high-quality conditional samples.
However, existing methods often require additional training or neural function evaluations (NFEs)
We propose a novel and efficient method, termed PLADIS, which boosts pre-trained models by leveraging sparse attention.
arXiv Detail & Related papers (2025-03-10T07:23:19Z) - TFG-Flow: Training-free Guidance in Multimodal Generative Flow [73.93071065307782]
We introduce TFG-Flow, a training-free guidance method for multimodal generative flow.
TFG-Flow addresses the curse-of-dimensionality while maintaining the property of unbiased sampling in guiding discrete variables.
We show that TFG-Flow has great potential in drug design by generating molecules with desired properties.
arXiv Detail & Related papers (2025-01-24T03:44:16Z) - Fast LiDAR Data Generation with Rectified Flows [3.297182592932918]
We present R2Flow, a fast and high-fidelity generative model for LiDAR data.<n>Our method is based on rectified flows that learn straight trajectories.<n>We also propose a efficient Transformer-based model architecture for processing the image representation of LiDAR range and reflectance measurements.
arXiv Detail & Related papers (2024-12-03T08:10:53Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.<n>We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.<n>The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Neural Residual Diffusion Models for Deep Scalable Vision Generation [17.931568104324985]
We propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM)
The proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks.
arXiv Detail & Related papers (2024-06-19T04:57:18Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques.
We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process.
Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z) - Unlocking the Power of GANs in Non-Autoregressive Text Generation [12.168952901520461]
We conduct pioneering study of building language GANs based on NAR structures.
We propose a GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT)
The experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass.
arXiv Detail & Related papers (2023-05-06T08:43:33Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - MagicFusion: Boosting Text-to-Image Generation Performance by Fusing
Diffusion Models [20.62953292593076]
We propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation.
SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks.
arXiv Detail & Related papers (2023-03-23T09:30:39Z) - Diffusion Guided Domain Adaptation of Image Generators [22.444668833151677]
We show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models.
Generators can be efficiently shifted into new domains indicated by text prompts without access to groundtruth samples from target domains.
Although not trained to minimize CLIP loss, our model achieves equally high CLIP scores and significantly lower FID than prior work on short prompts.
arXiv Detail & Related papers (2022-12-08T18:46:19Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.