MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads
- URL: http://arxiv.org/abs/2502.13963v1
- Date: Wed, 19 Feb 2025 18:59:15 GMT
- Title: MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads
- Authors: Weihao Liu, Ning Wu, Shiping Yang, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang,
- Abstract summary: Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input.<n>We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimize the attention distribution at the head level through contrastive learning.
- Score: 38.03745877569759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input, which severely impairs their long-context capabilities. Inspired by recent studies on the effectiveness of retrieval heads in long-context factutality, we aim at addressing this distraction issue through improving such retrieval heads directly. We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimizes the attention distribution at the head level through contrastive learning. According to the experimental results, MuDAF can significantly improve the long-context question answering performance of LLMs, especially in multi-document question answering. Extensive evaluations on retrieval scores and attention visualizations show that MuDAF possesses great potential in making attention heads more focused on relevant information and reducing attention distractions.
Related papers
- Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts [13.459944861140261]
Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts.
This paper shows that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts.
We identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts.
arXiv Detail & Related papers (2025-03-30T04:18:28Z) - Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification [20.49185921960757]
We show that attention heads swing between attending to local and long-context information depending on the query.
We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys.
arXiv Detail & Related papers (2025-02-11T00:04:32Z) - Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence [69.86946427928511]
We investigate the internal mechanisms driving hallucination in large vision-language models (LVLMs)
We introduce Vision-aware Head Divergence (VHD), a metric that quantifies the sensitivity of attention head outputs to visual context.
We propose Vision-aware Head Reinforcement (VHR), a training-free approach to mitigate hallucination by enhancing the role of vision-aware attention heads.
arXiv Detail & Related papers (2024-12-18T15:29:30Z) - Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models [62.698520962933195]
Large Vision-Language Models (LVLMs) excel in cross-model tasks but experience performance declines in long-context reasoning.
We propose a novel training-free context pruning method that selectively removes less critical textual information.
arXiv Detail & Related papers (2024-10-25T17:59:09Z) - Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs [50.40165119718928]
LongPiBench is a benchmark designed to assess positional bias involving multiple pieces of relevant information.
These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.
arXiv Detail & Related papers (2024-10-18T17:41:19Z) - On the token distance modeling ability of higher RoPE attention dimension [76.55792402912027]
We investigate the correlation between a hidden dimension of an attention head and its contribution to capturing long-distance dependencies.
We identify a particular type of attention heads, which we named Positional Heads, from various length-extrapolated models.
These heads exhibit a strong focus on long-range information interaction and play a pivotal role in long input processing.
arXiv Detail & Related papers (2024-10-11T10:47:02Z) - Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization [97.84156490765457]
Large language models (LLMs) struggle to capture relevant information located in the middle of their input.
This phenomenon has been known as the lost-in-the-middle problem.
We show found-in-the-middle achieves better performance in locating relevant information within a long context.
arXiv Detail & Related papers (2024-06-23T04:35:42Z) - Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training [9.128501882000315]
Large language models (LLMs) are struggling to seek correct information in long contexts.
This paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks.
Experimental results show substantial improvement in Multi-doc QA and other benchmarks, superior to state-of-the-art models by 13.7% absolute gain in shuffled settings.
arXiv Detail & Related papers (2023-11-15T18:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.