InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
- URL: http://arxiv.org/abs/2504.01707v2
- Date: Thu, 03 Apr 2025 08:53:06 GMT
- Title: InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
- Authors: Bowen Cao, Deng Cai, Wai Lam,
- Abstract summary: In-context learning (ICL) is critical for large language models (LLMs) but its effectiveness is constrained by finite context windows.<n>We introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory.<n>We demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting.
- Score: 57.310236384112834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) is critical for large language models (LLMs), but its effectiveness is constrained by finite context windows, particularly in ultra-long contexts. To overcome this, we introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates. This approach significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration through the principles of context knowledge elicitation, selection, and consolidation. Evaluations demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting across fact recall, grounded reasoning, and skill acquisition tasks. When conducting sequential multi-turn transformations on complex, real-world contexts (with length up to 2M tokens), our approach surpasses full-context prompting while using only 0.4% of the original contexts. These findings highlight InfiniteICL's potential to enhance the scalability and efficiency of LLMs by breaking the limitations of conventional context window sizes.
Related papers
- Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation [15.975325252309554]
We introduce a novel post-training synthetic data generation strategy designed to efficiently extend the context window of Large Language Models.
Our approach scalably extends to arbitrarily long context lengths, unconstrained by the length of available real-world data.
We demonstrate that our model, with a context length of up to 1M tokens, performs well on the RULER benchmark and InfiniteBench.
arXiv Detail & Related papers (2025-04-17T04:46:57Z) - LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning [45.30182393918228]
Long Input Fine-Tuning (LIFT) is a novel framework for long-context modeling.<n>LIFT dynamically adapts model parameters based on the long input.<n>Gated Memory is a specialized attention adapter that automatically balances long input memorization and ICL.
arXiv Detail & Related papers (2025-02-20T15:32:24Z) - InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU [48.105361428245736]
We introduce InfiniteHiP, an inference framework for large language models (LLMs)
We dynamically eliminate irrelevant context tokens through a modular hierarchical token pruning algorithm.
Our framework achieves an 18.95x speedup in attention decoding for a 1 million token context without requiring additional training.
arXiv Detail & Related papers (2025-02-13T02:52:01Z) - InfiniPot: Infinite Context Processing on Memory-Constrained LLMs [17.111422610001227]
InfiniPot is a novel KV cache control framework designed to enable pre-trained Large Language Models to manage extensive sequences efficiently.
InfiniPot effectively maintains critical data even without access to future context.
This work represents a substantial advancement toward making Large Language Models applicable to a broader range of real-world scenarios.
arXiv Detail & Related papers (2024-10-02T13:09:41Z) - ReAttention: Training-Free Infinite Context with Finite Attention Scope [65.91272939057592]
ReAttention is a training-free approach to support infinite context with finite attention scope under sufficient memory resources.<n>We validate the performance of ReAttention on the LongBench, L-Eval, and InfiniteBench and demonstrate that it is on par with traditional methods.<n>We also apply ReAttention on mainstream LLMs, including LLaMA3.1-8B and Mistral-v0.3-7B, enabling them to support context lengths of at least 1M and even expanding the context length of LLaMA3.2-3B-chat by 128$times$ to 4M without any further training in Needle-In-
arXiv Detail & Related papers (2024-07-21T14:23:37Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs [111.12010207132204]
UIO-LLMs is an incremental optimization approach for memory-enhanced transformers under long-context settings.
We refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm.
UIO-LLMs successfully handle long context, such as extending the context window of Llama2-7b-chat from 4K to 100K tokens with minimal 2% additional parameters.
arXiv Detail & Related papers (2024-06-26T08:44:36Z) - Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding [11.5386284281652]
We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing.
By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information.
Experimental results demonstrate that our method effectively empowers context-limited LLMs to engage in multi-hop reasoning with improved performance.
arXiv Detail & Related papers (2024-06-18T06:54:28Z) - CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs)
CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance.
Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z) - Parallel Context Windows for Large Language Models [52.965170346907904]
We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training.
Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters.
We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents.
arXiv Detail & Related papers (2022-12-21T11:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.