LongAlign: A Recipe for Long Context Alignment of Large Language Models
- URL: http://arxiv.org/abs/2401.18058v1
- Date: Wed, 31 Jan 2024 18:29:39 GMT
- Title: LongAlign: A Recipe for Long Context Alignment of Large Language Models
- Authors: Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang,
Yuxiao Dong, Juanzi Li
- Abstract summary: LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
- Score: 61.85923382850057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extending large language models to effectively handle long contexts requires
instruction fine-tuning on input sequences of similar length. To address this,
we present LongAlign -- a recipe of the instruction data, training, and
evaluation for long context alignment. First, we construct a long
instruction-following dataset using Self-Instruct. To ensure the data
diversity, it covers a broad range of tasks from various long context sources.
Second, we adopt the packing and sorted batching strategies to speed up
supervised fine-tuning on data with varied length distributions. Additionally,
we develop a loss weighting method to balance the contribution to the loss
across different sequences during packing training. Third, we introduce the
LongBench-Chat benchmark for evaluating instruction-following capabilities on
queries of 10k-100k in length. Experiments show that LongAlign outperforms
existing recipes for LLMs in long context tasks by up to 30\%, while also
maintaining their proficiency in handling short, generic tasks. The code, data,
and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign.
Related papers
- Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models [13.091271774417867]
Long-context modeling capabilities are important for large language models (LLMs) in various applications.
We propose a data mining framework textbfProLong that can assign each training sample with a long dependency score.
Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies.
arXiv Detail & Related papers (2024-05-28T07:36:56Z) - Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum [30.46329559544246]
We introduce dataset decomposition, a novel variable sequence length training technique.
We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach.
Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks.
arXiv Detail & Related papers (2024-05-21T22:26:01Z) - Long Context Alignment with Short Instructions and Synthesized Positions [56.1267385315404]
This paper introduces Step-Skipping Alignment (SkipAlign)
It is a new technique designed to enhance the long-context capabilities of Large Language Models (LLMs)
With a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
arXiv Detail & Related papers (2024-05-07T01:56:22Z) - LongHeads: Multi-Head Attention is Secretly a Long Context Processor [49.1661870007655]
LongHeads is a training-free framework that enhances large language models' long context ability.
Instead of allowing each head to attend to the full sentence, we allow each head to process in-distribution length by selecting and attending to context chunks.
LongHeads achieves 100% accuracy at the 128k length on passkey retrieval task.
arXiv Detail & Related papers (2024-02-16T13:39:34Z) - E^2-LLM: Efficient and Extreme Length Extension of Large Language Models [74.1254067728251]
We propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced cost.
Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.
arXiv Detail & Related papers (2024-01-13T02:11:20Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding.
It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.