Giraffe: Adventures in Expanding Context Lengths in LLMs
- URL: http://arxiv.org/abs/2308.10882v1
- Date: Mon, 21 Aug 2023 17:30:16 GMT
- Title: Giraffe: Adventures in Expanding Context Lengths in LLMs
- Authors: Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind
Sundararajan, Siddartha Naidu
- Abstract summary: We show that linear scaling is the best method for extending context length.
We also discover promising extrapolation capabilities in the truncated basis.
To support further research in this area, we release three new 13B parameter long-context models.
- Score: 7.8327063299618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern large language models (LLMs) that rely on attention mechanisms are
typically trained with fixed context lengths which enforce upper limits on the
length of input sequences that they can handle at evaluation time. To use these
models on sequences longer than the train-time context length, one might employ
techniques from the growing family of context length extrapolation methods --
most of which focus on modifying the system of positional encodings used in the
attention mechanism to indicate where tokens or activations are located in the
input sequence. We conduct a wide survey of existing methods of context length
extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own
design as well -- in particular, a new truncation strategy for modifying the
basis for the position encoding.
We test these methods using three new evaluation tasks (FreeFormQA,
AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to
be less fine-grained as a measure of long context performance of LLMs. We
release the three tasks publicly as datasets on HuggingFace. We discover that
linear scaling is the best method for extending context length, and show that
further gains can be achieved by using longer scales at evaluation time. We
also discover promising extrapolation capabilities in the truncated basis. To
support further research in this area, we release three new 13B parameter
long-context models which we call Giraffe: 4k and 16k context models trained
from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We
also release the code to replicate our results.
Related papers
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? [36.83397306207386]
We evaluate the capabilities of 17 leading Large Language Models (LLMs)
Strikingly, many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance.
We find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows.
arXiv Detail & Related papers (2024-11-07T18:59:27Z) - Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen.
It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models.
Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z) - HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models [89.28591263741973]
We introduce the Hierarchical Long Text Generation Benchmark (HelloBench) to evaluate Large Language Models' performance in generating long text.
Based on Bloom's taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and text generation.
Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human evaluation method that significantly reduces the time and effort required for human evaluation.
arXiv Detail & Related papers (2024-09-24T15:38:11Z) - ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window.
We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens.
Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z) - XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference [25.669630896777484]
We propose an efficient training free framework, named XL3M, which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning.
Evaluations on comprehensive benchmarks show the superiority of XL3M.
arXiv Detail & Related papers (2024-05-28T02:12:35Z) - Long Context Alignment with Short Instructions and Synthesized Positions [56.1267385315404]
This paper introduces Step-Skipping Alignment (SkipAlign)
It is a new technique designed to enhance the long-context capabilities of Large Language Models (LLMs)
With a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
arXiv Detail & Related papers (2024-05-07T01:56:22Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.