Parallel Context Windows for Large Language Models
- URL: http://arxiv.org/abs/2212.10947v3
- Date: Tue, 1 Aug 2023 16:48:47 GMT
- Title: Parallel Context Windows for Large Language Models
- Authors: Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri
Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
- Abstract summary: We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training.
Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters.
We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents.
- Score: 52.965170346907904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied to processing long text, Large Language Models (LLMs) are
limited by their context window. Existing efforts to address this limitation
involve training specialized architectures, and cannot be easily applied to
off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that
alleviates the context window restriction for any off-the-shelf LLM without
further training. The key to the approach is to carve a long context into
chunks (``windows''), restrict the attention mechanism to apply only within
each window, and re-use the positional embeddings across the windows. Our main
results test the PCW approach on in-context learning with models that range in
size between 750 million and 178 billion parameters, and show substantial
improvements for tasks with diverse input and output spaces. We show additional
benefits in other settings where long context windows may be beneficial:
multi-hop questions and retrieval-augmented question answering with multiple
retrieved documents. Our results highlight Parallel Context Windows as a
promising method for applying off-the-shelf LLMs in a range of settings that
require long text sequences. We make our code publicly available at
https://github.com/ai21labs/parallel-context-windows.
Related papers
- Exploring Context Window of Large Language Models via Decomposed Positional Vectors [107.19556541244654]
Transformer-based large language models (LLMs) typically have a limited context window.
In this study, we explore the positional information within and beyond the context window.
arXiv Detail & Related papers (2024-05-28T09:50:46Z) - LLoCO: Learning Long Contexts Offline [63.3458260335454]
We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.
We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window.
Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE)
We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training [91.99700930388998]
We propose Positional Skip-wisE training that simulates long inputs using a fixed context window.
PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning.
We have successfully extended the LLaMA model to 128k tokens using a 2k training context window.
arXiv Detail & Related papers (2023-09-19T08:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.