Related papers: Attention Instruction: Amplifying Attention in the Middle via Prompting

Attention Instruction: Amplifying Attention in the Middle via Prompting

URL: http://arxiv.org/abs/2406.17095v1
Date: Mon, 24 Jun 2024 19:35:11 GMT
Title: Attention Instruction: Amplifying Attention in the Middle via Prompting
Authors: Meiru Zhang, Zaiqiao Meng, Nigel Collier,
Abstract summary: Language models still suffer from position bias and have difficulty in accessing and using the middle part of the context. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting.
Score: 35.07098912195063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting. We augment the original task instruction with $\texttt{attention instructions}$ that direct language models to allocate more attention towards a selected segment of the context. We conduct a comprehensive investigation on multi-document question answering task with both position-based and index-based instructions. We find that language models do not have relative position awareness of the context. Nevertheless, they demonstrate the capacity to adapt attention to a specific segment using matching indexes. Our analysis contributes to a deeper understanding of position bias in LLMs and provides a pathway to mitigate this bias by instruction, thus benefiting LLMs in locating and utilizing relevant information from retrieved documents in RAG applications.

Related papers

Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation [7.478369203246005]
Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering tasks. In multilingual RAG, retrieved passages can be written in languages other than that of the query entered by the user.
arXiv Detail & Related papers (2025-04-01T09:55:23Z)
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts [13.459944861140261]
Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. This paper shows that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts. We identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts.
arXiv Detail & Related papers (2025-03-30T04:18:28Z)
SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence [43.136091203995925]
SelfElicit is an inference-time approach that helps LMs focus on key contextual evidence through self-guided explicit highlighting.<n>We demonstrate that SelfElicit brings consistent and significant improvement on multiple evidence-based QA tasks.
arXiv Detail & Related papers (2025-02-12T20:13:56Z)
Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs [3.55026004901472]
We introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations. Our method is both language and LLM-agnostic, making it a general-purpose tool.
arXiv Detail & Related papers (2024-12-29T11:33:51Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information. During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning. We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models [41.233879429714925]
This study critically examines the capacity of API-based large language models to comprehend phrase semantics. We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions. We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics.
arXiv Detail & Related papers (2024-10-03T08:44:17Z)
Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs) We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z)
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional. (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of. LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z)
Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? [45.233517779029334]
We identify whether responses are attributed to generated or retrieved contexts. Experiments reveal a significant bias in several LLMs to favor generated contexts, even when they provide incorrect information.
arXiv Detail & Related papers (2024-01-22T12:54:04Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
Lenna: Language Enhanced Reasoning Detection Assistant [22.105472753701076]
Reasoning power and world knowledge embedded in large language models have been much less investigated and exploited for image perception tasks. We propose Lenna, a language-enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs. Lenna demonstrates outstanding performance on ReasonDet and comes with significantly low training costs.
arXiv Detail & Related papers (2023-12-05T02:19:35Z)
Attention Sorting Combats Recency Bias In Long Context Language Models [69.06809365227504]
Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training. We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
arXiv Detail & Related papers (2023-09-28T05:19:06Z)
IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations [11.008412414253662]
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs. We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
arXiv Detail & Related papers (2023-06-24T05:02:34Z)
Contextual information integration for stance detection via cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target. Most existing stance detection models are limited because they do not consider relevant contextual information. We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.