Related papers: LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

URL: http://arxiv.org/abs/2309.12307v3
Date: Fri, 8 Mar 2024 15:26:38 GMT
Title: LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Authors: Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
Abstract summary: LongLoRA is an efficient fine-tuning approach that extends the context sizes of pre-trained large language models. We demonstrate strong empirical results on various tasks on Llama2 models from 7B/13B to 70B.
Score: 67.58275666573496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shifted sparse attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA combines this improved LoRA with S^2-Attn. LongLoRA demonstrates strong empirical results on various tasks on Llama2 models from 7B/13B to 70B. LongLoRA extends Llama2 7B from 4k context to 100k, or Llama2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like Flash-Attention2. In addition, we further conduct supervised fine-tuning with LongLoRA and our long instruction-following LongAlpaca dataset.

Related papers

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models [54.44375226381814]
Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling. We introduce a efficient training recipe for building ultra-long context LLMs from aligned instruct model, pushing the boundaries of context lengths from 128K to 1M, 2M, and 4M tokens. Our approach achieves state-of-the-art performance across a diverse set of long-context benchmarks.
arXiv Detail & Related papers (2025-04-08T16:58:58Z)
Why Does the Effective Context Length of LLMs Fall Short? [68.34573617977013]
In this work, we introduce ShifTed Rotray position embeddING (STRING) STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that STRING dramatically improves the performance of the latest large-scale models.
arXiv Detail & Related papers (2024-10-24T13:51:50Z)
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts [38.867323730365406]
LongGen finetunes a pretrained LLM into an efficient architecture during length extension. LongGen achieves 1.55x training speedup and reduces wall-clock time by 36%, compared to a full-attention baseline. During inference, LongGen reduces KV cache memory by 62%, achieving 1.67x prefilling speedup and 1.41x decoding speedup.
arXiv Detail & Related papers (2024-10-02T12:35:53Z)
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z)
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window. We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens. Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z)
Extending Llama-3's Context Ten-Fold Overnight [23.163055795834765]
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The resulting model exhibits superior performances across a broad range of evaluation tasks.
arXiv Detail & Related papers (2024-04-30T13:25:20Z)
LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts. LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z)
Training-Free Long-Context Scaling of Large Language Models [114.53296002607993]
We propose Dual Chunk Attention, which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By decomposing the attention for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens.
arXiv Detail & Related papers (2024-02-27T12:39:23Z)
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models [2.4366811507669124]
LongQLoRA is a method to extend context length of large language models with less training resources. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets.
arXiv Detail & Related papers (2023-11-08T18:33:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.