Indirect Attention: Turning Context Misalignment into a Feature
- URL: http://arxiv.org/abs/2509.26015v1
- Date: Tue, 30 Sep 2025 09:44:00 GMT
- Title: Indirect Attention: Turning Context Misalignment into a Feature
- Authors: Bissmella Bahaduri, Hicham Talaoubrid, Fangchen Feng, Zuheng Ming, Anissa Mokraoui,
- Abstract summary: This work explores a less conventional scenario, when keys and values originate from different sequences or modalities.<n>We first analyze the attention mechanism's behavior under noisy value features, establishing a critical noise threshold.<n>We then model context (key, value) misalignment as an effective form of structured noise within the value features, demonstrating that the noise induced by such misalignment can substantially exceed this critical threshold.<n>Motivated by this, we introduce Indirect Attention, a modified attention mechanism that infers relevance indirectly in scenarios with misaligned context.
- Score: 2.3425919199730694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The attention mechanism has become a cornerstone of modern deep learning architectures, where keys and values are typically derived from the same underlying sequence or representation. This work explores a less conventional scenario, when keys and values originate from different sequences or modalities. Specifically, we first analyze the attention mechanism's behavior under noisy value features, establishing a critical noise threshold beyond which signal degradation becomes significant. Furthermore, we model context (key, value) misalignment as an effective form of structured noise within the value features, demonstrating that the noise induced by such misalignment can substantially exceed this critical threshold, thereby compromising standard attention's efficacy. Motivated by this, we introduce Indirect Attention, a modified attention mechanism that infers relevance indirectly in scenarios with misaligned context. We evaluate the performance of Indirect Attention across a range of synthetic tasks and real world applications, showcasing its superior ability to handle misalignment.
Related papers
- FAIR: Focused Attention Is All You Need for Generative Recommendation [43.65370600297507]
We propose the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones.<n>Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores.<n>We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.
arXiv Detail & Related papers (2025-12-12T03:25:12Z) - CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction [42.92011330807996]
$textitCTR-Sink$ is a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios.<n>Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information.
arXiv Detail & Related papers (2025-08-05T17:30:34Z) - Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation [62.14692332209628]
"Interaction Distillation" is a novel training framework for more adequate preference modeling through attention-level optimization.<n>It provides more stable and generalizable reward signals compared to state-of-the-art RM optimization methods.
arXiv Detail & Related papers (2025-08-04T17:06:23Z) - Transformers Learn Faster with Semantic Focus [57.97235825738412]
We study sparse transformers in terms of learnability and generalization.<n>We find that input-dependent sparse attention models appear to converge faster and generalize better than standard attention models.
arXiv Detail & Related papers (2025-06-17T01:19:28Z) - Static Key Attention in Vision [19.014373531742297]
We study the impact of substituting the dynamic parameterized key with a static key within the standard attention mechanism in Vision Transformers.<n>Our findings reveal that static key attention mechanisms can match or even exceed the performance of standard self-attention.
arXiv Detail & Related papers (2024-12-09T23:18:09Z) - Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks.
In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z) - Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [74.72150542395487]
An inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness.
To address this issue, we propose a novel inference method named Attention Buckets.
arXiv Detail & Related papers (2023-12-07T17:24:51Z) - Generic Attention-model Explainability by Weighted Relevance
Accumulation [9.816810016935541]
We propose a weighted relevancy strategy, which takes the importance of token values into consideration, to reduce distortion when equally accumulating relevance.
To evaluate our method, we propose a unified CLIP-based two-stage model, named CLIPmapper, to process Vision-and-Language tasks.
arXiv Detail & Related papers (2023-08-20T12:02:30Z) - Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.