Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
- URL: http://arxiv.org/abs/2401.03462v2
- Date: Fri, 2 Feb 2024 12:34:25 GMT
- Title: Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
- Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng
Dou
- Abstract summary: We propose a new method called Activation Beacon, which condenses LLM's raw activations into compact forms.
Activation Beacon is introduced as a plug-in module, which fully preserves the LLM's original capability in short contexts.
Our experiment verifies Activation Beacon's effectiveness of context extension: it can remarkably accomplish high-quality extension of Llama-2-7B's context by $times100$ times.
- Score: 23.369013431288998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The utilization of long contexts poses a big challenge for LLMs due to their
limited context window size. Although the context window can be extended
through fine-tuning, it will result in a considerable cost at both training and
inference time, and exert an unfavorable impact to the LLM's original
capabilities. In this work, we propose a new method called Activation Beacon,
which condenses LLM's raw activations into compact forms such that the LLM can
perceive a longer context with a limited context window. Activation Beacon is
introduced as a plug-in module, which fully preserves the LLM's original
capability in short contexts. It works with the sliding window to streamingly
process the long context, which leads to a competitive memory and time
efficiency in both training and inference. Activation Beacon is trained with
short-sequence data of diversified condensing ratios. Thanks to such a
treatment, it can be effectively learned to support different context lengths
with a small training cost. Our experiment verifies Activation Beacon's
effectiveness of context extension: it can remarkably accomplish high-quality
extension of Llama-2-7B's context by $\times100$ times (from 4K to 400K);
meanwhile, it can also achieve superior performances across a variety of
long-context language modeling and understanding tasks. The source code and
model checkpoint are available at
\url{https://github.com/FlagOpen/FlagEmbedding}.
Related papers
- LLoCO: Learning Long Contexts Offline [63.3458260335454]
We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.
We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM.
InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention.
Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z) - Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window.
Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE)
We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs)
CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance.
Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training [91.99700930388998]
We propose Positional Skip-wisE training that simulates long inputs using a fixed context window.
PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning.
We have successfully extended the LLaMA model to 128k tokens using a 2k training context window.
arXiv Detail & Related papers (2023-09-19T08:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.