Related papers: Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

URL: http://arxiv.org/abs/2510.00517v1
Date: Wed, 01 Oct 2025 05:01:39 GMT
Title: Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
Authors: Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki,
Abstract summary: Differential Attention (DA) has been proposed as a refinement to standard attention.<n>We show that DA introduces a structural fragility under adversarial perturbations.<n>We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP.
Score: 5.716454975957338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

Related papers

When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks [2.4923006485141284]
We demonstrate that encoder-side poisoning induces persistent, trigger-free semantic corruption.<n> backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods.<n>Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates.
arXiv Detail & Related papers (2026-02-21T23:48:04Z)
Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs [67.69730908817321]
Internal Positive Attention Dynamics (PAD) in LVLMs naturally reveal semantically core visual regions under the distortions of attention sinks.<n>We propose Positive Attention Dynamics Enhancement (PADE), a training-free attention intervention that constructs a PAD map to identify semantically core visual regions.
arXiv Detail & Related papers (2026-02-17T13:08:06Z)
SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models [4.677212795400693]
Bag-of-Patches behavior of Visions under weak structural supervision acts as a contributing factor of object hallucinations.<n>We introduce a training-free algorithm called Structure-Disrupted Contrastive Decoding (SDCD)<n>By penalizing tokens that maintain high confidence under this structure-less view, SDCD effectively suppresses the texture-driven bias.
arXiv Detail & Related papers (2026-01-07T01:27:58Z)
Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks [6.367978467906828]
Reinforcement learning policies are vulnerable to adversarial attacks in the observation space.<n>We propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations.<n>Results show that the antifragile policy consistently outperforms standard and robust RL baselines.
arXiv Detail & Related papers (2025-06-26T10:10:41Z)
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models [26.51079570548107]
Large language models (LLMs) often exhibit Context Faithfulness Hallucinations.<n>We propose Dynamic Attention-Guided Context Decoding (DAGCD), a lightweight framework that leverages attention distributions and uncertainty signals in a single-pass decoding.
arXiv Detail & Related papers (2025-01-02T05:07:06Z)
Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection [20.651257973799527]
Recalibrating Attention of Industrial Anomaly Detection (RAAD) is a framework that systematically decomposes and recalibrates attention maps.<n> HQS dynamically adjusts bit-widths based on the hierarchical nature of attention maps.<n>We validate the effectiveness of RAAD on 32 datasets using a single 3090ti.
arXiv Detail & Related papers (2024-12-11T08:31:47Z)
Source-Free Domain Adaptive Object Detection with Semantics Compensation [54.00183496587841]
We introduce Weak-to-strong Semantics Compensation (WSCo) for strong data augmentation.<n>WSCo compensates for the class-relevant semantics that may be lost during strong augmentation on the fly.<n>WSCo can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines.
arXiv Detail & Related papers (2024-10-07T23:32:06Z)
Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency [61.394997313144394]
Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT) We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
arXiv Detail & Related papers (2024-05-25T14:56:30Z)
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks.<n>In this paper, we propose that the intrinsicity in the DBP process is the primary factor driving robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z)
Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings. We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z)
Soft ascent-descent as a stable and flexible alternative to flooding [6.527016551650139]
We propose a softened, pointwise mechanism called SoftAD that downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect of flooding. We demonstrate how SoftAD can realize classification accuracy competitive with flooding while enjoying a much smaller loss generalization gap and model norm.
arXiv Detail & Related papers (2023-10-16T02:02:56Z)
A Spectral Perspective towards Understanding and Improving Adversarial Robustness [8.912245110734334]
adversarial training (AT) has proven to be an effective defense approach, but mechanism for robustness improvement is not fully understood. We show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased representations, to gain robustness. We propose a spectral alignment regularization (SAR) such that the spectral output inferred by an attacked adversarial input stays as close as possible to its natural input counterpart.
arXiv Detail & Related papers (2023-06-25T14:47:03Z)
Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.