Giraffe: Adventures in Expanding Context Lengths in LLMs
- URL: http://arxiv.org/abs/2308.10882v1
- Date: Mon, 21 Aug 2023 17:30:16 GMT
- Title: Giraffe: Adventures in Expanding Context Lengths in LLMs
- Authors: Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind
Sundararajan, Siddartha Naidu
- Abstract summary: We show that linear scaling is the best method for extending context length.
We also discover promising extrapolation capabilities in the truncated basis.
To support further research in this area, we release three new 13B parameter long-context models.
- Score: 7.8327063299618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern large language models (LLMs) that rely on attention mechanisms are
typically trained with fixed context lengths which enforce upper limits on the
length of input sequences that they can handle at evaluation time. To use these
models on sequences longer than the train-time context length, one might employ
techniques from the growing family of context length extrapolation methods --
most of which focus on modifying the system of positional encodings used in the
attention mechanism to indicate where tokens or activations are located in the
input sequence. We conduct a wide survey of existing methods of context length
extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own
design as well -- in particular, a new truncation strategy for modifying the
basis for the position encoding.
We test these methods using three new evaluation tasks (FreeFormQA,
AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to
be less fine-grained as a measure of long context performance of LLMs. We
release the three tasks publicly as datasets on HuggingFace. We discover that
linear scaling is the best method for extending context length, and show that
further gains can be achieved by using longer scales at evaluation time. We
also discover promising extrapolation capabilities in the truncated basis. To
support further research in this area, we release three new 13B parameter
long-context models which we call Giraffe: 4k and 16k context models trained
from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We
also release the code to replicate our results.
Related papers
- XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference [25.669630896777484]
We propose an efficient training free framework, named XL3M, which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning.
Evaluations on comprehensive benchmarks show the superiority of XL3M.
arXiv Detail & Related papers (2024-05-28T02:12:35Z) - Long Context Alignment with Short Instructions and Synthesized Positions [56.1267385315404]
This paper introduces Step-Skipping Alignment (SkipAlign)
It is a new technique designed to enhance the long-context capabilities of Large Language Models (LLMs)
With a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
arXiv Detail & Related papers (2024-05-07T01:56:22Z) - RULER: What's the Real Context Size of Your Long-Context Language Models? [23.220973811374225]
We create a new benchmark for evaluating long-context language models (LMs)
We evaluate ten long-context LMs with 13 representative tasks in RULER.
Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases.
arXiv Detail & Related papers (2024-04-09T23:41:27Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context
Evaluation Benchmark for Large Language Models [61.06694491246026]
M4LE is a benchmark for evaluating the long-sequence capability of large language models (LLMs)
M4LE is based on a diverse NLP task pool comprising 36 NLP task types and 12 domains.
We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs.
arXiv Detail & Related papers (2023-10-30T03:11:30Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.