Related papers: S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

URL: http://arxiv.org/abs/2508.12880v2
Date: Thu, 11 Sep 2025 10:04:07 GMT
Title: S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
Authors: Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Xiu Li,
Abstract summary: S2-Guidance is a novel method that leverages block-dropping during the forward process to construct sub-networks.<n>Experiments on text-to-image and text-to-video generation tasks demonstrate that S2-Guidance delivers superior performance.
Score: 26.255679321570014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.

Related papers

Understanding Sampler Stochasticity in Training Diffusion Models for RLHF [11.537564997052606]
This paper theoretically characterizes the reward gap and provides non-vacuous bounds for general diffusion models.<n> Empirically, our findings through large-scale experiments on text-to-image models validate that reward gaps consistently narrow over training.
arXiv Detail & Related papers (2025-10-12T19:08:38Z)
G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)<n>Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.<n>Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.<n>Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance [27.91782770050068]
Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data. Recent works have confirmed that additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, but compromises the model's robustness against distribution shifts. We propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models.
arXiv Detail & Related papers (2024-04-01T02:01:33Z)
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces. We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z)
Enhancing Certified Robustness via Smoothed Weighted Ensembling [7.217295098686032]
We employ a Smoothed WEighted ENsembling scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models.
arXiv Detail & Related papers (2020-05-19T11:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.