In-situ Autoguidance: Eliciting Self-Correction in Diffusion Models
- URL: http://arxiv.org/abs/2510.17136v1
- Date: Mon, 20 Oct 2025 04:06:50 GMT
- Title: In-situ Autoguidance: Eliciting Self-Correction in Diffusion Models
- Authors: Enhao Gu, Haolin Hou,
- Abstract summary: In-situ Autoguidance elicits guidance from the model itself without any auxiliary components.<n>We demonstrate that this zero-cost approach is not only viable but also establishes a powerful new baseline for cost-efficient guidance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The generation of high-quality, diverse, and prompt-aligned images is a central goal in image-generating diffusion models. The popular classifier-free guidance (CFG) approach improves quality and alignment at the cost of reduced variation, creating an inherent entanglement of these effects. Recent work has successfully disentangled these properties by guiding a model with a separately trained, inferior counterpart; however, this solution introduces the considerable overhead of requiring an auxiliary model. We challenge this prerequisite by introducing In-situ Autoguidance, a method that elicits guidance from the model itself without any auxiliary components. Our approach dynamically generates an inferior prediction on the fly using a stochastic forward pass, reframing guidance as a form of inference-time self-correction. We demonstrate that this zero-cost approach is not only viable but also establishes a powerful new baseline for cost-efficient guidance, proving that the benefits of self-guidance can be achieved without external models.
Related papers
- SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback [19.637094881784634]
We propose textbfSAIL (textbfSelf-textbfAmplified textbfIterative textbfLearning), a novel framework that enables diffusion models to act as their own teachers through iterative self-improvement.
arXiv Detail & Related papers (2026-02-05T06:58:38Z) - Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation [39.484148941369234]
Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation.<n>We propose a novel alignment framework by leveraging the underlying nature of the alignment problem.<n>We achieve comparable performance to finetuning-based models with one-step generation with at least a 60% reduction in computational cost.
arXiv Detail & Related papers (2026-01-31T00:06:55Z) - Steering Guidance for Personalized Text-to-Image Diffusion Models [19.550718192994353]
Existing sampling guidance methods fail to guide the output toward well-balanced space.<n>We propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt.<n>Our method explicitly steers the outputs toward a balanced latent space without additional computational overhead.
arXiv Detail & Related papers (2025-08-01T05:02:26Z) - How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models [57.42800112251644]
We propose Step AG, which is a simple, universally applicable adaptive guidance strategy.<n>Our evaluations focus on both image quality and image-text alignment.
arXiv Detail & Related papers (2025-06-10T02:09:48Z) - Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z) - Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model [62.11981915549919]
Domain Guidance is a transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain.<n>We demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_textDINOv2$ compared to standard fine-tuning.
arXiv Detail & Related papers (2025-04-02T09:07:55Z) - Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.<n>Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.<n>We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z) - Guiding a diffusion model using sliding windows [0.9402985123717579]
We introduce emphmasked sliding window guidance (M-SWG), a novel, training-free method.<n>M-SWG upweights long-range spatial dependencies by guiding the primary model with itself by selectively restricting its receptive field.<n>M-SWG achieves a superior Inception score (IS) compared to previous state-of-the-art training-free approaches.
arXiv Detail & Related papers (2024-11-15T15:04:04Z) - Fast-ELECTRA for Efficient Pre-training [83.29484808667532]
ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model.
We propose Fast-ELECTRA, which leverages an existing language model as the auxiliary model.
Our approach rivals the performance of state-of-the-art ELECTRA-style pre-training methods, while significantly eliminating the computation and memory cost brought by the joint training of the auxiliary model.
arXiv Detail & Related papers (2023-10-11T09:55:46Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.