Related papers: Lost in the Middle: How Language Models Use Long Contexts

Lost in the Middle: How Language Models Use Long Contexts

URL: http://arxiv.org/abs/2307.03172v3
Date: Mon, 20 Nov 2023 23:09:34 GMT
Title: Lost in the Middle: How Language Models Use Long Contexts
Authors: Nelson F. Liu and Kevin Lin and John Hewitt and Ashwin Paranjape and Michele Bevilacqua and Fabio Petroni and Percy Liang
Abstract summary: We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts. We find that performance can degrade significantly when changing the position of relevant information. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
Score: 88.78803442320246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

Related papers

Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model. Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones. Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries [54.325172923155414]
We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts.
arXiv Detail & Related papers (2024-09-19T10:38:01Z)
A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z)
Context versus Prior Knowledge in Language Models [49.17879668110546]
Language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity.
arXiv Detail & Related papers (2024-04-06T13:46:53Z)
Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs [6.869834883252353]
This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Our results demonstrate that large language models are capable of generating graph queries from dialogues.
arXiv Detail & Related papers (2024-01-03T12:28:33Z)
Attention Sorting Combats Recency Bias In Long Context Language Models [69.06809365227504]
Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training. We leverage this fact to introduce attention sorting'': perform one step of decoding, sort documents by the attention they receive, repeat the process, generate the answer with the newly sorted context.
arXiv Detail & Related papers (2023-09-28T05:19:06Z)
Prompting Large Language Models to Reformulate Queries for Moment Localization [79.57593838400618]
The task of moment localization is to localize a temporal moment in an untrimmed video for a given natural language query. We make early attempts at reformulating the moment queries into a set of instructions using large language models and making them more friendly to the localization models.
arXiv Detail & Related papers (2023-06-06T05:48:09Z)
Are Large Language Models Robust Coreference Resolvers? [17.60248310475889]
We show that prompting for coreference can outperform current unsupervised coreference systems. Further investigations reveal that instruction-tuned LMs generalize surprisingly well across domains, languages, and time periods.
arXiv Detail & Related papers (2023-05-23T19:38:28Z)
Large Language Models Can Be Easily Distracted by Irrelevant Context [29.315230178997002]
We investigate how the model problem-solving accuracy can be influenced by irrelevant context. We use benchmark to measure the distractibility of cutting-edge prompting techniques for large language models.
arXiv Detail & Related papers (2023-01-31T20:48:57Z)
Black-box language model explanation by context length probing [7.526153863886609]
We present context length probing, a novel explanation technique for causal language models. The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities. We apply context length probing to large pre-trained language models and offer some initial analyses and insights.
arXiv Detail & Related papers (2022-12-30T16:24:10Z)
Large Language Models Struggle to Learn Long-Tail Knowledge [39.01608375863687]
We study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web. In particular, we show that a language model's ability to answer a fact-based question relates to how many documents associated with that question were seen during pre-training.
arXiv Detail & Related papers (2022-11-15T18:49:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.