Related papers: A Comprehensive Survey on Long Context Language Modeling

A Comprehensive Survey on Long Context Language Modeling

URL: http://arxiv.org/abs/2503.17407v1
Date: Thu, 20 Mar 2025 17:06:28 GMT
Title: A Comprehensive Survey on Long Context Language Modeling
Authors: Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li, Tianyu Liu, Fanyu Meng, Wenbo Su, Yingshui Tan, Zili Wang, Jian Yang, Wei Ye, Bo Zheng, Wangchunshu Zhou, Wenhao Huang, Sujian Li, Zhaoxiang Zhang,
Abstract summary: Long Context Language Models (LCLMs) process and analyze extensive inputs in an effective and efficient way.<n>Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively.
Score: 118.5540791080351
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: \href{https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling}{\color[RGB]{175,36,67}{LCLM-Horizon}}.

Related papers

Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models [36.69535336525585]
Long-context language models (LCLMs) have exhibited impressive capabilities in long-context understanding tasks.<n>Long-context referencing is a crucial task that requires LCLMs to attribute items of interest to specific parts of long-context data.<n>This paper proposes Ref-Long, a novel benchmark designed to assess the long-context referencing capability of LCLMs.
arXiv Detail & Related papers (2025-07-13T06:17:53Z)
An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques [0.0]
Large Language Models (LLMs) continue to advance natural language processing with their ability to generate human-like text.<n>We present a systematic evaluation of six LLMs across four datasets: CNN/Daily Mail and NewsRoom (news), SAMSum (dialog), and ArXiv (scientific)<n>Our study evaluates the performance using the ROUGE and BERTScore metrics.<n>For Long documents, introduce a sentence-based chunking strategy that enables LLMs with shorter context windows to summarize extended inputs in multiple stages.
arXiv Detail & Related papers (2025-07-07T15:34:05Z)
Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model.<n>Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones.<n>Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data [6.195658947075431]
We introduce HoloBench, a framework that brings database reasoning operations into text-based contexts. We show that the amount of information in the context has a bigger influence on LCLM performance than the context length. We find that tasks requiring the aggregation of multiple pieces of information show a noticeable drop in accuracy as context length increases.
arXiv Detail & Related papers (2024-10-15T19:04:13Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? [37.64593022203498]
NeedleBench is a framework consisting of progressively more challenging tasks for assessing bilingual long-context capabilities. We use the framework to assess how well the leading open-source models can identify key information relevant to the question. We propose the Ancestral Trace Challenge to mimic the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks.
arXiv Detail & Related papers (2024-07-16T17:59:06Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Large Language Models: A Survey [66.39828929831017]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Large Language Models for Time Series: A Survey [34.24258745427964]
Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance.
arXiv Detail & Related papers (2024-02-02T07:24:35Z)
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey [18.930417261395906]
Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents. This article offers a survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs.
arXiv Detail & Related papers (2023-11-21T04:59:17Z)
LooGLE: Can Long-Context Language Models Understand Long Contexts? [46.143956498529796]
LooGLE is a benchmark for large language models' long context understanding. It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains. The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z)
PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents [78.27865456183397]
We propose PEARL, a prompting framework to improve reasoning over long documents. Each stage of PEARL is implemented via zero-shot or few-shot prompting with minimal human input. We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts.
arXiv Detail & Related papers (2023-05-23T23:06:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.