An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
- URL: http://arxiv.org/abs/2406.07138v2
- Date: Thu, 10 Oct 2024 07:46:14 GMT
- Title: An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
- Authors: Tong Wu, Yanpeng Zhao, Zilong Zheng,
- Abstract summary: We propose a method to extend the context length of pre-trained large language models (LLMs)
$textttCREAM$ interpolates positional encodings by manipulating position indices.
Experiments show that $textttCREAM$ successfully extends LLMs to the target length for both Base and Chat versions of $texttLlama2-7B$ with "Never Miss A Beat"
- Score: 25.20222970947923
- License:
- Abstract: Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle ($\texttt{CREAM}$), which interpolates positional encodings by manipulating position indices. Apart from being simple, $\texttt{CREAM}$ is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the "Lost-in-the-Middle" problem faced by long-context LLMs. Experimental results show that $\texttt{CREAM}$ successfully extends LLMs to the target length for both Base and Chat versions of $\texttt{Llama2-7B}$ with "Never Miss A Beat". Our code is publicly available at https://github.com/bigai-nlco/cream.
Related papers
- FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models.
It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies.
LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z) - Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training [42.89066583603415]
This work identifies three critical $textitO$bstacles: lack of comprehensive evaluation, ($textitO$2) untested viability for scaling, and ($textitO$3) lack of empirical guidelines.
We show that a depthwise stacking operator, called $G_textstack$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance.
arXiv Detail & Related papers (2024-05-24T08:00:00Z) - LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts.
LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA.
Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation [45.79215260916687]
We propose textbf$M2Chat$, a novel unified multimodal LLM framework for generating interleaved text-image conversation.
$M3Adapter$ integrates granular low-level visual information and high-level semantic features from multi-modality prompts.
$M3FT$ fine-tuning strategy optimize disjoint groups of parameters for image-text alignment and visual-instruction.
arXiv Detail & Related papers (2023-11-29T11:30:33Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training [91.99700930388998]
We propose Positional Skip-wisE training that simulates long inputs using a fixed context window.
PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning.
We have successfully extended the LLaMA model to 128k tokens using a 2k training context window.
arXiv Detail & Related papers (2023-09-19T08:03:38Z) - LeanContext: Cost-Efficient Domain-Specific Question Answering Using
LLMs [1.9468358338146958]
Question-answering (QA) is a significant application of Large Language Models (LLMs)
In this paper, we shift from human-oriented summarizers to AI model-friendly summaries.
Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query.
arXiv Detail & Related papers (2023-09-02T06:33:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.