LongQLoRA: Efficient and Effective Method to Extend Context Length of
Large Language Models
- URL: http://arxiv.org/abs/2311.04879v2
- Date: Thu, 9 Nov 2023 05:21:28 GMT
- Title: LongQLoRA: Efficient and Effective Method to Extend Context Length of
Large Language Models
- Authors: Jianxin Yang
- Abstract summary: LongQLoRA is a method to extend context length of large language models with less training resources.
With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps.
LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets.
- Score: 2.4366811507669124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present LongQLoRA, an efficient and effective method to extend context
length of large language models with less training resources. LongQLoRA
combines the advantages of Position Interpolation, QLoRA and Shift Short
Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the
context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within
1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on
PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close
to MPT-7B-8K within the evaluation context length of 8192. We collect and build
39k long instruction data to extend context length of Vicuna-13B from 4096 to
8192 and achieve good performance both in long and short context generation
task. We also do some ablation experiments to study the effect of LoRA rank,
finetuning steps and attention patterns in inference.The model weights,
training data and code are avaliable at
https://github.com/yangjianxin1/LongQLoRA.
Related papers
- How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
ProLong-8B, which is from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models.
It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies.
LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z) - MemLong: Memory-Augmented Retrieval for Long Text Modeling [37.49036666949963]
This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation.
MemLong combines a non-differentiable ret-mem'' module with a partially trainable decoder-only language model.
Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs.
arXiv Detail & Related papers (2024-08-30T02:01:56Z) - LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [67.58275666573496]
LongLoRA is an efficient fine-tuning approach that extends the context sizes of pre-trained large language models.
We demonstrate strong empirical results on various tasks on Llama2 models from 7B/13B to 70B.
arXiv Detail & Related papers (2023-09-21T17:59:11Z) - LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding.
It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.