Related papers: Fine-Grained Perturbation Guidance via Attention Head Selection

Fine-Grained Perturbation Guidance via Attention Head Selection

URL: http://arxiv.org/abs/2506.10978v3
Date: Tue, 29 Jul 2025 07:23:54 GMT
Title: Fine-Grained Perturbation Guidance via Attention Head Selection
Authors: Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Sangwu Lee, Sayak Paul, Susung Hong, Seungryong Kim,
Abstract summary: "HeadHunter" is a systematic framework for iteratively selecting attention heads that align with user-centric objectives.<n>"SoftPAG" is a continuous knob to tune perturbation strength and suppress artifacts.<n>We validate our method on modern large-scale DiT-based text-to-image models.
Score: 33.77035944924774
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose "HeadHunter", a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head's attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.

Related papers

Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization [11.10178274806454]
We propose a form of weak supervision that improves the annotation efficiency and detection performance.<n>We re-annotated mainstream IML datasets with scribble labels and propose the first scribble-based IML dataset.<n>We employ self-supervised training with a structural consistency loss to encourage the model to produce consistent predictions.
arXiv Detail & Related papers (2025-07-17T11:45:27Z)
Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition [38.74388860692423]
We propose a novel approach, Semantic-guided fine-tuning of foundation model for long-tailed visual recognition (Sage)<n>We introduce an SG-Adapter that integrates class descriptions as semantic guidance to guide the fine-tuning of the visual encoder.<n>Experiments on benchmark datasets demonstrate the effectiveness of the proposed Sage in enhancing performance in long-tailed learning.
arXiv Detail & Related papers (2025-07-17T05:47:19Z)
Interactive Video Generation via Domain Adaptation [7.397099215417549]
Text-conditioned diffusion models have emerged as powerful tools for high-quality video generation.<n>Recent training-free approaches introduce attention masking to guide trajectory, but this often degrades quality.<n>We identify two key modes failure in these methods, both of which we interpret as domain problems.
arXiv Detail & Related papers (2025-05-30T06:19:47Z)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective [54.605073936695575]
Graph anomaly detection aims to identify unusual patterns in graph-based data, with wide applications in fields such as web security and financial fraud detection.<n>Existing methods rely on contrastive learning, assuming that a lower similarity between a node and its local subgraph indicates abnormality.<n>The presence of interfering edges invalidates this assumption, since it introduces disruptive noise that compromises the contrastive learning process.<n>We propose a Clean-View Enhanced Graph Anomaly Detection framework (CVGAD), which includes a multi-scale anomaly awareness module to identify key sources of interference in the contrastive learning process.
arXiv Detail & Related papers (2025-05-23T15:05:56Z)
Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations [56.84921040837699]
We propose a novel semi-supervised method named Semi360, which incorporates the priors of the panoramic layout and distortion through collaborative perturbations.<n>Our experimental results on three mainstream benchmarks demonstrate that the proposed method offers significant advantages over existing state-of-the-art (SoTA) solutions.
arXiv Detail & Related papers (2025-03-03T02:49:20Z)
Learning Flow Fields in Attention for Controllable Person Image Generation [59.10843756343987]
Controllable person image generation aims to generate a person image conditioned on reference images.<n>We propose learning flow fields in attention (Leffa), which explicitly guides the target query to attend to the correct reference key.<n>Leffa achieves state-of-the-art performance in controlling appearance (virtual try-on) and pose (pose transfer), significantly reducing fine-grained detail distortion.
arXiv Detail & Related papers (2024-12-11T15:51:14Z)
Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection [20.651257973799527]
Recalibrating Attention of Industrial Anomaly Detection (RAAD) is a framework that systematically decomposes and recalibrates attention maps.<n> HQS dynamically adjusts bit-widths based on the hierarchical nature of attention maps.<n>We validate the effectiveness of RAAD on 32 datasets using a single 3090ti.
arXiv Detail & Related papers (2024-12-11T08:31:47Z)
Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense [5.150608040339816]
We introduce PADL, a new solution able to generate image-specific perturbations using a symmetric scheme of encoding and decoding based on cross-attention. Our method generalizes to a range of unseen models with diverse architectural designs, such as StarGANv2, BlendGAN, DiffAE, StableDiffusion and StableDiffusionXL.
arXiv Detail & Related papers (2024-09-26T15:16:32Z)
Noise-Free Explanation for Driving Action Prediction [11.330363757618379]
We propose an easy-to-implement but effective way to remedy this flaw: Smooth Noise Norm Attention (SNNA) We weigh the attention by the norm of the transformed value vector and guide the label-specific signal with the attention gradient, then randomly sample the input perturbations and average the corresponding gradients to produce noise-free attribution. Both qualitative and quantitative evaluation results show the superiority of SNNA compared to other SOTA attention-based explainable methods in generating a clearer visual explanation map and ranking the input pixel importance.
arXiv Detail & Related papers (2024-07-08T19:21:24Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.<n>We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.<n>Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z)
Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data. We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias. This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z)
A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers. Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module. Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z)
Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights. We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.