Parallel Context Windows for Large Language Models
- URL: http://arxiv.org/abs/2212.10947v3
- Date: Tue, 1 Aug 2023 16:48:47 GMT
- Title: Parallel Context Windows for Large Language Models
- Authors: Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri
Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
- Abstract summary: We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training.
Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters.
We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents.
- Score: 52.965170346907904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied to processing long text, Large Language Models (LLMs) are
limited by their context window. Existing efforts to address this limitation
involve training specialized architectures, and cannot be easily applied to
off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that
alleviates the context window restriction for any off-the-shelf LLM without
further training. The key to the approach is to carve a long context into
chunks (``windows''), restrict the attention mechanism to apply only within
each window, and re-use the positional embeddings across the windows. Our main
results test the PCW approach on in-context learning with models that range in
size between 750 million and 178 billion parameters, and show substantial
improvements for tasks with diverse input and output spaces. We show additional
benefits in other settings where long context windows may be beneficial:
multi-hop questions and retrieval-augmented question answering with multiple
retrieved documents. Our results highlight Parallel Context Windows as a
promising method for applying off-the-shelf LLMs in a range of settings that
require long text sequences. We make our code publicly available at
https://github.com/ai21labs/parallel-context-windows.
Related papers
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? [36.83397306207386]
We evaluate the capabilities of 17 leading Large Language Models (LLMs)
Strikingly, many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance.
We find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows.
arXiv Detail & Related papers (2024-11-07T18:59:27Z) - LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models.
It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies.
LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z) - Exploring Context Window of Large Language Models via Decomposed Positional Vectors [107.19556541244654]
Transformer-based large language models (LLMs) typically have a limited context window.
In this study, we explore the positional information within and beyond the context window.
arXiv Detail & Related papers (2024-05-28T09:50:46Z) - Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window.
Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE)
We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training [91.99700930388998]
We propose Positional Skip-wisE training that simulates long inputs using a fixed context window.
PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning.
We have successfully extended the LLaMA model to 128k tokens using a 2k training context window.
arXiv Detail & Related papers (2023-09-19T08:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.