Attention Instruction: Amplifying Attention in the Middle via Prompting
- URL: http://arxiv.org/abs/2406.17095v1
- Date: Mon, 24 Jun 2024 19:35:11 GMT
- Title: Attention Instruction: Amplifying Attention in the Middle via Prompting
- Authors: Meiru Zhang, Zaiqiao Meng, Nigel Collier,
- Abstract summary: Language models still suffer from position bias and have difficulty in accessing and using the middle part of the context.
We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting.
- Score: 35.07098912195063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting. We augment the original task instruction with $\texttt{attention instructions}$ that direct language models to allocate more attention towards a selected segment of the context. We conduct a comprehensive investigation on multi-document question answering task with both position-based and index-based instructions. We find that language models do not have relative position awareness of the context. Nevertheless, they demonstrate the capacity to adapt attention to a specific segment using matching indexes. Our analysis contributes to a deeper understanding of position bias in LLMs and provides a pathway to mitigate this bias by instruction, thus benefiting LLMs in locating and utilizing relevant information from retrieved documents in RAG applications.
Related papers
- When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages [9.138590152838754]
Segment-level quality estimation (QE) is a challenging cross-lingual language understanding task.
We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios.
Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models.
arXiv Detail & Related papers (2025-01-08T12:54:05Z) - Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs [3.55026004901472]
We introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations.
Our method is both language and LLM-agnostic, making it a general-purpose tool.
arXiv Detail & Related papers (2024-12-29T11:33:51Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.
We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.
We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z) - Found in the Middle: How Language Models Use Long Contexts Better via
Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional.
(Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of.
LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z) - Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? [45.233517779029334]
We identify whether responses are attributed to generated or retrieved contexts.
Experiments reveal a significant bias in several LLMs to favor generated contexts, even when they provide incorrect information.
arXiv Detail & Related papers (2024-01-22T12:54:04Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - Lenna: Language Enhanced Reasoning Detection Assistant [22.105472753701076]
Reasoning power and world knowledge embedded in large language models have been much less investigated and exploited for image perception tasks.
We propose Lenna, a language-enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs.
Lenna demonstrates outstanding performance on ReasonDet and comes with significantly low training costs.
arXiv Detail & Related papers (2023-12-05T02:19:35Z) - Attention Sorting Combats Recency Bias In Long Context Language Models [69.06809365227504]
Current language models often fail to incorporate long contexts efficiently during generation.
We show that a major contributor to this issue are attention priors that are likely learned during pre-training.
We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
arXiv Detail & Related papers (2023-09-28T05:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.