LongQLoRA: Efficient and Effective Method to Extend Context Length of
Large Language Models
- URL: http://arxiv.org/abs/2311.04879v2
- Date: Thu, 9 Nov 2023 05:21:28 GMT
- Title: LongQLoRA: Efficient and Effective Method to Extend Context Length of
Large Language Models
- Authors: Jianxin Yang
- Abstract summary: LongQLoRA is a method to extend context length of large language models with less training resources.
With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps.
LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets.
- Score: 2.4366811507669124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present LongQLoRA, an efficient and effective method to extend context
length of large language models with less training resources. LongQLoRA
combines the advantages of Position Interpolation, QLoRA and Shift Short
Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the
context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within
1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on
PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close
to MPT-7B-8K within the evaluation context length of 8192. We collect and build
39k long instruction data to extend context length of Vicuna-13B from 4096 to
8192 and achieve good performance both in long and short context generation
task. We also do some ablation experiments to study the effect of LoRA rank,
finetuning steps and attention patterns in inference.The model weights,
training data and code are avaliable at
https://github.com/yangjianxin1/LongQLoRA.
Related papers
- LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens.
We develop two novel methods for creating synthetic data.
LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z) - Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models [13.091271774417867]
Long-context modeling capabilities are important for large language models (LLMs) in various applications.
We propose a data mining framework textbfProLong that can assign each training sample with a long dependency score.
Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies.
arXiv Detail & Related papers (2024-05-28T07:36:56Z) - RULER: What's the Real Context Size of Your Long-Context Language Models? [23.220973811374225]
We create a new benchmark for evaluating long-context language models (LMs)
We evaluate ten long-context LMs with 13 representative tasks in RULER.
Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases.
arXiv Detail & Related papers (2024-04-09T23:41:27Z) - LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z) - LooGLE: Can Long-Context Language Models Understand Long Contexts? [50.408957515411096]
LooGLE is a benchmark for large language models' long context understanding.
It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains.
The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [67.58275666573496]
LongLoRA is an efficient fine-tuning approach that extends the context sizes of pre-trained large language models.
We demonstrate strong empirical results on various tasks on Llama2 models from 7B/13B to 70B.
arXiv Detail & Related papers (2023-09-21T17:59:11Z) - LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding.
It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z) - Giraffe: Adventures in Expanding Context Lengths in LLMs [7.8327063299618]
We show that linear scaling is the best method for extending context length.
We also discover promising extrapolation capabilities in the truncated basis.
To support further research in this area, we release three new 13B parameter long-context models.
arXiv Detail & Related papers (2023-08-21T17:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.