A Survey on Transformer Context Extension: Approaches and Evaluation
- URL: http://arxiv.org/abs/2503.13299v1
- Date: Mon, 17 Mar 2025 15:44:09 GMT
- Title: A Survey on Transformer Context Extension: Approaches and Evaluation
- Authors: Yijun Liu, Jinzheng Yu, Yang Xu, Zhongyang Li, Qingfu Zhu,
- Abstract summary: Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP)<n>However, when it comes to long context scenarios, the performance of LLMs degrades due to some challenges.
- Score: 12.727256455697011
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long context scenarios, the performance of LLMs degrades due to some challenges. To alleviate this phenomenon, there is a number of work proposed recently. In this survey, we first list the challenges of applying pre-trained LLMs to process long contexts. Then systematically review the approaches related to long context and propose our taxonomy categorizing them into four main types: positional encoding, context compression, retrieval augmented, and attention pattern. In addition to the approaches, we focus on the evaluation of long context, organizing relevant data, tasks, and metrics based on existing long context benchmarks. Finally, we summarize unresolved issues in the long context domain and put forward our views on future developments.
Related papers
- LiveLongBench: Tackling Long-Context Understanding for Spoken Texts from Live Streams [4.917265821383127]
We construct the first spoken long-text dataset, derived from live streams, to reflect the redundancy-rich and conversational nature of real-world scenarios.
We evaluate both popular LLMs and specialized methods to assess their ability to understand long-contexts in these tasks.
Our findings highlight key limitations of current methods and suggest future directions for improving long-context understanding.
arXiv Detail & Related papers (2025-04-24T08:27:48Z) - Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model.
Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones.
Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z) - Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning.<n>We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z) - Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - What is Wrong with Perplexity for Long-context Language Modeling? [71.34933096461124]
Long-context inputs are crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning.<n>Perplexity (PPL) has proven unreliable for assessing long-context capabilities.<n>We propose bfLongPPL, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them.
arXiv Detail & Related papers (2024-10-31T09:39:28Z) - FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding [32.197113821638936]
We propose a novel integrated Long-Context Large Language Model (FltLM)
FltLM incorporates a context filter with a soft mask mechanism, identifying and dynamically excluding irrelevant content to concentrate on pertinent information.
Experimental results demonstrate that FltLM significantly outperforms supervised fine-tuning and retrieval-based methods in complex QA scenarios.
arXiv Detail & Related papers (2024-10-09T13:47:50Z) - Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding [28.191029786204624]
We introduce the Long Question Coreference Adaptation (LQCA) method to enhance the performance of large language models (LLMs)<n>This framework focuses on coreference resolution tailored to long contexts, allowing the model to identify and manage references effectively.<n>Our code is public at https://github.com/OceannTwT/LQCA.
arXiv Detail & Related papers (2024-10-02T15:39:55Z) - A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts.
Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts.
We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens [63.7488938083696]
We introduce NovelQA, a benchmark tailored for evaluating Large Language Models (LLMs) with complex, extended narratives.
NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding.
Our evaluation of long-context LLMs on NovelQA reveals significant insights into their strengths and weaknesses.
arXiv Detail & Related papers (2024-03-18T17:32:32Z) - Thread of Thought Unraveling Chaotic Contexts [133.24935874034782]
"Thread of Thought" (ThoT) strategy draws inspiration from human cognitive processes.
In experiments, ThoT significantly improves reasoning performance compared to other prompting techniques.
arXiv Detail & Related papers (2023-11-15T06:54:44Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.