Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance
- URL: http://arxiv.org/abs/2511.07499v1
- Date: Wed, 12 Nov 2025 01:01:49 GMT
- Title: Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance
- Authors: Kwanyoung Kim,
- Abstract summary: Adversarial Sinkhorn Attention Guidance (ASAG) is a novel method that reinterprets attention scores in diffusion models through the lens of optimal transport.<n>Instead of naively corrupting the attention mechanism, ASAG injects an adversarial cost within self-attention layers to reduce pixel-wise similarity between queries and keys.<n>ASAG shows consistent improvements in text-to-image diffusion, and enhances controllability and fidelity in downstream applications such as IP-Adapter and ControlNet.
- Score: 8.46069844016289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have demonstrated strong generative performance when using guidance methods such as classifier-free guidance (CFG), which enhance output quality by modifying the sampling trajectory. These methods typically improve a target output by intentionally degrading another, often the unconditional output, using heuristic perturbation functions such as identity mixing or blurred conditions. However, these approaches lack a principled foundation and rely on manually designed distortions. In this work, we propose Adversarial Sinkhorn Attention Guidance (ASAG), a novel method that reinterprets attention scores in diffusion models through the lens of optimal transport and intentionally disrupt the transport cost via Sinkhorn algorithm. Instead of naively corrupting the attention mechanism, ASAG injects an adversarial cost within self-attention layers to reduce pixel-wise similarity between queries and keys. This deliberate degradation weakens misleading attention alignments and leads to improved conditional and unconditional sample quality. ASAG shows consistent improvements in text-to-image diffusion, and enhances controllability and fidelity in downstream applications such as IP-Adapter and ControlNet. The method is lightweight, plug-and-play, and improves reliability without requiring any model retraining.
Related papers
- Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance [35.84939959436188]
In stiff regions, the ODE trajectory changes sharply, where local truncation error (LTE) becomes a critical factor that deteriorates sample quality.<n>We propose Embedded Runge-Kutta Guidance (ERKGuid), which exploits detected stiffness to reduce LTE and stabilize sampling.<n>Our experiments on both synthetic datasets and the popular benchmark dataset, ImageNet, demonstrate that ERKGuid consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-03-04T03:37:57Z) - Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z) - EMAG: Self-Rectifying Diffusion Sampling with Exponential Moving Average Guidance [31.550239698285058]
In diffusion and flow-matching generative models, guidance techniques are widely used to improve sample quality and consistency.<n>Recent work explores contrasting negative samples at inference using a weaker model.<n>We propose Exponential Moving Average Guidance (EMAG), a training-free mechanism that modifies attention at inference time in diffusion transformers.
arXiv Detail & Related papers (2025-12-19T07:36:07Z) - Enhancing Diffusion Model Guidance through Calibration and Regularization [9.22066257345387]
This paper introduces two complementary contributions to address this issue.<n>First, we propose a differentiable calibration objective based on the Smooth Expected Error (Smooth ECE)<n>Second, we develop enhanced sampling guidance methods that operate on off-the-shelf classifiers without requiring retraining.
arXiv Detail & Related papers (2025-11-08T04:23:42Z) - Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement [63.54516423266521]
Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism.<n>We propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics.<n>Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control.
arXiv Detail & Related papers (2025-10-20T02:40:06Z) - TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling [53.61290359948953]
Tangential Amplifying Guidance (TAG) operates solely on trajectory signals without modifying the underlying diffusion model.<n>We formalize this guidance process by leveraging a first-order Taylor expansion.<n> TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition.
arXiv Detail & Related papers (2025-10-06T06:53:29Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts [55.298031232672734]
As-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment.
We present a novel method to enhance negative CFG guidance using contrastive loss.
arXiv Detail & Related papers (2024-11-26T03:29:27Z) - Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance [28.354284737867136]
Perturbed-Attention Guidance (PAG) improves diffusion sample quality across both unconditional and conditional settings.<n>In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios.
arXiv Detail & Related papers (2024-03-26T04:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.