Related papers: MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

URL: http://arxiv.org/abs/2502.13963v1
Date: Wed, 19 Feb 2025 18:59:15 GMT
Title: MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads
Authors: Weihao Liu, Ning Wu, Shiping Yang, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang,
Abstract summary: Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input.<n>We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimize the attention distribution at the head level through contrastive learning.
Score: 38.03745877569759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input, which severely impairs their long-context capabilities. Inspired by recent studies on the effectiveness of retrieval heads in long-context factutality, we aim at addressing this distraction issue through improving such retrieval heads directly. We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimizes the attention distribution at the head level through contrastive learning. According to the experimental results, MuDAF can significantly improve the long-context question answering performance of LLMs, especially in multi-document question answering. Extensive evaluations on retrieval scores and attention visualizations show that MuDAF possesses great potential in making attention heads more focused on relevant information and reducing attention distractions.

Related papers

Beyond Isolated Capabilities: Bridging Long CoT Reasoning and Long-Context Understanding [16.50502775216771]
Reasoning distillation has emerged as an effective approach to enhance the reasoning capabilities of smaller language models.<n>The impact of large-scale reasoning distillation on other critical abilities, particularly in-context retrieval and reasoning, remains unexplored.
arXiv Detail & Related papers (2025-07-20T07:43:16Z)
CAI: Caption-Sensitive Attention Intervention for Mitigating Object Hallucination in Large Vision-Language Models [60.0300765815417]
Large Vision-Language Models (LVLMs) frequently produce content that deviates from visual information, leading to object hallucination.<n>We propose Caption-sensitive Attention Intervention (CAI), a training-free, plug-and-play hallucination mitigation method.
arXiv Detail & Related papers (2025-06-30T07:52:36Z)
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [47.764552063499046]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z)
CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability [55.46506909726119]
We introduce $textbfCAFE$, a two-stage coarse-to-fine method to enhance multi-document question-answering capacities.<n>CAFE achieves up to 22.1% and 13.7% SubEM improvement over SFT and RAG methods on the Mistral model, respectively.
arXiv Detail & Related papers (2025-05-15T08:05:12Z)
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts [13.459944861140261]
Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. This paper shows that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts. We identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts.
arXiv Detail & Related papers (2025-03-30T04:18:28Z)
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification [20.49185921960757]
We show that attention heads swing between attending to local and long-context information depending on the query. We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys.
arXiv Detail & Related papers (2025-02-11T00:04:32Z)
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence [69.86946427928511]
We investigate the internal mechanisms driving hallucination in large vision-language models (LVLMs) We introduce Vision-aware Head Divergence (VHD), a metric that quantifies the sensitivity of attention head outputs to visual context. We propose Vision-aware Head Reinforcement (VHR), a training-free approach to mitigate hallucination by enhancing the role of vision-aware attention heads.
arXiv Detail & Related papers (2024-12-18T15:29:30Z)
Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information. During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z)
Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models [62.698520962933195]
Large Vision-Language Models (LVLMs) excel in cross-model tasks but experience performance declines in long-context reasoning. We propose a novel training-free context pruning method that selectively removes less critical textual information.
arXiv Detail & Related papers (2024-10-25T17:59:09Z)
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs [50.40165119718928]
LongPiBench is a benchmark designed to assess positional bias involving multiple pieces of relevant information. These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.
arXiv Detail & Related papers (2024-10-18T17:41:19Z)
On the token distance modeling ability of higher RoPE attention dimension [76.55792402912027]
We investigate the correlation between a hidden dimension of an attention head and its contribution to capturing long-distance dependencies. We identify a particular type of attention heads, which we named Positional Heads, from various length-extrapolated models. These heads exhibit a strong focus on long-range information interaction and play a pivotal role in long input processing.
arXiv Detail & Related papers (2024-10-11T10:47:02Z)
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization [97.84156490765457]
Large language models (LLMs) struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. We show found-in-the-middle achieves better performance in locating relevant information within a long context.
arXiv Detail & Related papers (2024-06-23T04:35:42Z)
Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training [9.128501882000315]
Large language models (LLMs) are struggling to seek correct information in long contexts. This paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks. Experimental results show substantial improvement in Multi-doc QA and other benchmarks, superior to state-of-the-art models by 13.7% absolute gain in shuffled settings.
arXiv Detail & Related papers (2023-11-15T18:42:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.