Related papers: Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

URL: http://arxiv.org/abs/2407.15176v1
Date: Sun, 21 Jul 2024 14:23:37 GMT
Title: Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope
Authors: Xiaoran Liu, Qipeng Guo, Yuerong Song, Zhigeng Liu, Kai Lv, Hang Yan, Linlin Li, Qun Liu, Xipeng Qiu,
Abstract summary: LongCache is a training-free approach that enables LLMs to support an infinite context with finite context scope. We validate LongCache on the LongBench and L-Eval and demonstrate its performance is on par with traditional full-attention mechanisms. We will improve the efficiency of LongCache by GPU-aware optimization soon.
Score: 68.10585571422929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The maximum supported context length is a critical bottleneck limiting the practical application of the Large Language Model (LLM). Although existing length extrapolation methods can extend the context of LLMs to millions of tokens, these methods all have an explicit upper bound. In this work, we propose LongCache, a training-free approach that enables LLM to support an infinite context with finite context scope, through full-context cache selection and training-free integration. This effectively frees LLMs from the length extrapolation issue. We validate LongCache on the LongBench and L-Eval and demonstrate its performance is on par with traditional full-attention mechanisms. Furthermore, we have applied LongCache on mainstream LLMs, including LLaMA3 and Mistral-v0.3, enabling them to support context lengths of at least 400K in Needle-In-A-Haystack tests. We will improve the efficiency of LongCache by GPU-aware optimization soon.

Related papers

InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation [57.310236384112834]
In-context learning (ICL) is critical for large language models (LLMs) but its effectiveness is constrained by finite context windows. We introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory. We demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting.
arXiv Detail & Related papers (2025-04-02T13:15:44Z)
Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing [19.577278316436807]
Large Language Models (LLMs) are limited by the context window size. We propose a novel method that leverages the LLMs's own attention information to enable accurate retrieval. InfiniRetri achieves 100% accuracy in the Needle-In-a-Haystack(NIH) test over 1M tokens using a 0.5B parameter model.
arXiv Detail & Related papers (2025-02-18T15:45:36Z)
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU [48.105361428245736]
We introduce InfiniteHiP, an inference framework for large language models (LLMs) We dynamically eliminate irrelevant context tokens through a modular hierarchical token pruning algorithm. Our framework achieves an 18.95x speedup in attention decoding for a 1 million token context without requiring additional training.
arXiv Detail & Related papers (2025-02-13T02:52:01Z)
Why Does the Effective Context Length of LLMs Fall Short? [68.34573617977013]
In this work, we introduce ShifTed Rotray position embeddING (STRING) STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that STRING dramatically improves the performance of the latest large-scale models.
arXiv Detail & Related papers (2024-10-24T13:51:50Z)
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs [12.250524667536606]
Large language models (LLMs) still suffer from the challenging extrapolation problem. We conduct a theoretical analysis to better understand why No Position. (NoPE) fails outside its effective range, as well as examining the power of Position. (PE) in this context. We introduce a novel weave PE method, MesaExtrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair-Extrapolation to manage the final chunk.
arXiv Detail & Related papers (2024-10-21T10:39:05Z)
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs [17.111422610001227]
InfiniPot is a novel KV cache control framework designed to enable pre-trained Large Language Models to manage extensive sequences efficiently. InfiniPot effectively maintains critical data even without access to future context. This work represents a substantial advancement toward making Large Language Models applicable to a broader range of real-world scenarios.
arXiv Detail & Related papers (2024-10-02T13:09:41Z)
SirLLM: Streaming Infinite Retentive LLM [74.40196814292426]
Large Language Models (LLMs) process inputs of any length and maintain a degree of memory. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs. We introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues.
arXiv Detail & Related papers (2024-05-21T06:37:03Z)
An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs [54.91212829143966]
This study explores LLaMA3's capabilities when quantized to low bit-width. We evaluate 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets. Our experimental results indicate that LLaMA3 still suffers non-negligent degradation in linguistic and visual contexts.
arXiv Detail & Related papers (2024-04-22T10:03:03Z)
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional. (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of. LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z)
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM. InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z)
CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs) CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance. Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z)
Giraffe: Adventures in Expanding Context Lengths in LLMs [7.8327063299618]
We show that linear scaling is the best method for extending context length. We also discover promising extrapolation capabilities in the truncated basis. To support further research in this area, we release three new 13B parameter long-context models.
arXiv Detail & Related papers (2023-08-21T17:30:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.