Related papers: RePo: Language Models with Context Re-Positioning

RePo: Language Models with Context Re-Positioning

URL: http://arxiv.org/abs/2512.14391v1
Date: Tue, 16 Dec 2025 13:30:30 GMT
Title: RePo: Language Models with Context Re-Positioning
Authors: Huayang Li, Tianyu Zhao, Richard Sproat,
Abstract summary: In-context learning is fundamental to modern Large Language Models (LLMs)<n> prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices.<n>We propose RePo, a novel mechanism that reduces extraneous load via context re-positioning.
Score: 10.269249887819988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, $f_φ$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context. Our code is available at https://github.com/SakanaAI/repo.

Related papers

From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition [46.36937947958481]
We introduce a novel explicit compression framework designed to preserve both global structure and fine-grained details.<n>Our approach reformulates a structural context compression as a structure-then-select process.<n>Our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs.
arXiv Detail & Related papers (2025-12-16T09:52:58Z)
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs [72.8830548005884]
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models.<n>Standard implementations utilize only the real component of the complex-valued dot product for attention score calculation.<n>We propose an extension that re-incorporates this imaginary component.
arXiv Detail & Related papers (2025-12-08T12:59:54Z)
Concept than Document: Context Compression via AMR-based Conceptual Entropy [21.954536296551]
Large Language Models (LLMs) face information overload when handling long contexts, particularly in Retrieval-Augmented Generation (RAG)<n>We propose an unsupervised context compression framework that exploits Abstract Representation (AMR) graphs to preserve semantically essential information while filtering out irrelevant text.
arXiv Detail & Related papers (2025-11-24T07:08:02Z)
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [18.829572148850563]
We introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks.<n>Across agent and domain-specific benchmarks, ACE consistently outperforms strong baselines.<n> ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback.
arXiv Detail & Related papers (2025-10-06T09:30:18Z)
CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
ELITE: Embedding-Less retrieval with Iterative Text Exploration [5.8851517822935335]
Large Language Models (LLMs) have achieved impressive progress in natural language processing.<n>Their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks.
arXiv Detail & Related papers (2025-05-17T08:48:43Z)
PICASO: Permutation-Invariant Context Composition with State Space Models [98.91198288025117]
State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states.<n>We propose a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens.<n>We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
arXiv Detail & Related papers (2025-02-24T19:48:00Z)
Core Context Aware Transformers for Long Context Language Modeling [50.774702091154204]
We propose a plug-and-play Core Context Aware (CCA) Attention for efficient long-context modeling.<n>Our method automatically focuses and strengthens core context while diminishing redundancy during the learning process.<n>Our method is able to replace the self-attention module in existing Large Language Models with minimal fine-tuning cost.
arXiv Detail & Related papers (2024-12-17T01:54:08Z)
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.