Related papers: Attention Sorting Combats Recency Bias In Long Context Language Models

Attention Sorting Combats Recency Bias In Long Context Language Models

URL: http://arxiv.org/abs/2310.01427v1
Date: Thu, 28 Sep 2023 05:19:06 GMT
Title: Attention Sorting Combats Recency Bias In Long Context Language Models
Authors: Alexander Peysakhovich, Adam Lerer
Abstract summary: Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training. We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
Score: 69.06809365227504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay preferential attention to that document compared to an irrelevant document at the same position. We leverage this fact to introduce ``attention sorting'': perform one step of decoding, sort documents by the attention they receive (highest attention going last), repeat the process, generate the answer with the newly sorted context. We find that attention sorting improves performance of long context models. Our findings highlight some challenges in using off-the-shelf language models for retrieval augmented generation.

Related papers

Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
In-context Pretraining: Language Modeling Beyond Document Boundaries [137.53145699439898]
In-Context Pretraining is a new approach where language models are pretrained on a sequence of related documents. We introduce approximate algorithms for finding related documents with efficient nearest neighbor search. We see notable improvements in tasks that require more complex contextual reasoning.
arXiv Detail & Related papers (2023-10-16T17:57:12Z)
Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant. Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z)
Lost in the Middle: How Language Models Use Long Contexts [88.78803442320246]
We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts. We find that performance can degrade significantly when changing the position of relevant information. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
arXiv Detail & Related papers (2023-07-06T17:54:11Z)
Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions [0.0]
We have investigated the use of advanced machine learning methods for pre-annotating data for a lexical extension task. We have examined the correlation of the automatic scores with the human annotation. While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity.
arXiv Detail & Related papers (2023-06-03T14:57:47Z)
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner. A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge. Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z)
Large Language Models Can Be Easily Distracted by Irrelevant Context [29.315230178997002]
We investigate how the model problem-solving accuracy can be influenced by irrelevant context. We use benchmark to measure the distractibility of cutting-edge prompting techniques for large language models.
arXiv Detail & Related papers (2023-01-31T20:48:57Z)
Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE. We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z)
On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.