Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
- URL: http://arxiv.org/abs/2503.23306v1
- Date: Sun, 30 Mar 2025 04:18:28 GMT
- Title: Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
- Authors: Youxiang Zhu, Ruochen Li, Danqing Wang, Daniel Haehn, Xiaohui Liang,
- Abstract summary: Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts.<n>This paper shows that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts.<n>We identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts.
- Score: 13.459944861140261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. The reason for distraction remains poorly understood. In this paper, we first identify the contextual heads, a special group of attention heads that control the overall attention of the LLM. Then, we demonstrate that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts and can be mitigated by increasing attention to these contexts. We further identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts without explicitly specifying which context is relevant. We comprehensively evaluate the effect of focus direction on various long-context tasks and find out focus directions could help to mitigate the poor task alignment of the long-context LLMs. We believe our findings could promote further research on long-context LLM alignment.
Related papers
- Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas [52.478956204238315]
We study the spatial reasoning challenge from the lens of mechanistic interpretability.<n>We observe that successful spatial reasoning correlates strongly with the model's ability to align its attention with actual object locations.<n>Motivated by these findings, we propose ADAPTVIS to sharpen the attention on highly relevant regions when confident.
arXiv Detail & Related papers (2025-03-03T17:57:03Z) - MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads [38.03745877569759]
Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input.<n>We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimize the attention distribution at the head level through contrastive learning.
arXiv Detail & Related papers (2025-02-19T18:59:15Z) - Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models [62.698520962933195]
Large Vision-Language Models (LVLMs) excel in cross-model tasks but experience performance declines in long-context reasoning.
We propose a novel training-free context pruning method that selectively removes less critical textual information.
arXiv Detail & Related papers (2024-10-25T17:59:09Z) - Attention Instruction: Amplifying Attention in the Middle via Prompting [35.07098912195063]
Language models still suffer from position bias and have difficulty in accessing and using the middle part of the context.
We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting.
arXiv Detail & Related papers (2024-06-24T19:35:11Z) - Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization [97.84156490765457]
Large language models (LLMs) struggle to capture relevant information located in the middle of their input.
This phenomenon has been known as the lost-in-the-middle problem.
We show found-in-the-middle achieves better performance in locating relevant information within a long context.
arXiv Detail & Related papers (2024-06-23T04:35:42Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [74.72150542395487]
An inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness.
To address this issue, we propose a novel inference method named Attention Buckets.
arXiv Detail & Related papers (2023-12-07T17:24:51Z) - Cross-modal Attention Congruence Regularization for Vision-Language
Relation Alignment [105.70884254216973]
We show that relation alignment can be enforced by encouraging the directed language attention from'mug' to 'grass'
We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention.
We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground.
arXiv Detail & Related papers (2022-12-20T18:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.