Long-Context Language Modeling with Parallel Context Encoding
- URL: http://arxiv.org/abs/2402.16617v2
- Date: Tue, 11 Jun 2024 18:54:13 GMT
- Title: Long-Context Language Modeling with Parallel Context Encoding
- Authors: Howard Yen, Tianyu Gao, Danqi Chen,
- Abstract summary: We introduce a framework that can be applied to any existing decoder-only LLMs to extend their context window.
CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention.
CEPE yields strong performance on language modeling and in-context learning.
- Score: 37.64884969997378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks.
Related papers
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [49.43759617227999]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - LongEmbed: Extending Embedding Models for Long Context Retrieval [87.60404151086715]
This paper explores context window extension of embedding models, pushing the limit to 32k without requiring additional training.
First, we examine the performance of current embedding models for long context retrieval on our newly constructed LongEmbed benchmark.
Experiments show that training-free context window extension strategies like positionRo can effectively extend the context window of existing embedding models by several folds.
arXiv Detail & Related papers (2024-04-18T11:29:23Z) - LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens [7.833740464264734]
Current extended context windows are limited to around 128k tokens.
LongRoPE extends the context window of pre-trained LLMs to an impressive 2048k tokens.
arXiv Detail & Related papers (2024-02-21T12:30:33Z) - Flexibly Scaling Large Language Models Contexts Through Extensible
Tokenization [6.9004592877749005]
Large language models (LLMs) are in need of sufficient contexts to handle many critical applications.
Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training and inference stage.
We present Extensible Tokenization as an alternative method which realizes the flexible scaling of LLMs' context.
arXiv Detail & Related papers (2024-01-15T16:00:50Z) - LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding.
It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z) - In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.