Related papers: Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

URL: http://arxiv.org/abs/2512.08082v1
Date: Mon, 08 Dec 2025 22:25:00 GMT
Title: Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
Authors: Vala Vakilian, Zimeng Wang, Ankit Singh Rawat, Christos Thrampoulidis,
Abstract summary: We measure the minimum context length needed to reproduce accurate full-context predictions.<n>For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most.<n>We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token.
Score: 48.429870236229696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that simple thresholding of the metric defining DaMCL achieves high performance in detecting long vs. short context sequences. Finally, to counter the bias that short-context dominance induces in LLM output distributions, we develop an intuitive decoding algorithm that leverages our detector to identify and boost tokens that are long-range-relevant. Across Q&A tasks and model architectures, we confirm that mitigating the bias improves performance.

Related papers

The Limits of Long-Context Reasoning in Automated Bug Fixing [4.853967615615349]
Large language models (LLMs) can directly reason over entire contexts.<n>Recent advances in LLMs have enabled strong performance on software engineering benchmarks.<n>We systematically evaluate whether current LLMs can reliably perform long-context code and patch generation.
arXiv Detail & Related papers (2026-02-17T22:51:40Z)
Beyond the Needle's Illusion: Decoupled Evaluation of Evidence Access and Use under Semantic Interference at 326M-Token Scale [18.13756357502514]
We introduce EverMemBench-S (EMB-S), an adversarial NIAH-style benchmark built on a 326M-token MemoryBank.<n>While the full MemoryBank spans 326M tokens for retrieval-based (RAG) evaluation, we evaluate native long-context models only at scales that fit within each model's context window.
arXiv Detail & Related papers (2026-01-28T05:44:00Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
What is Wrong with Perplexity for Long-context Language Modeling? [71.34933096461124]
Long-context inputs are crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning.<n>Perplexity (PPL) has proven unreliable for assessing long-context capabilities.<n>We propose bfLongPPL, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them.
arXiv Detail & Related papers (2024-10-31T09:39:28Z)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.<n>NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.<n>In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z)
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference [25.669630896777484]
We propose an efficient training free framework, named XL3M, which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning. Evaluations on comprehensive benchmarks show the superiority of XL3M.
arXiv Detail & Related papers (2024-05-28T02:12:35Z)
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering. We transform the raw text into a graph structure to build connections between different factual sentences. We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z)
KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs) We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM. We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z)
Understanding Emergent In-Context Learning from a Kernel Regression Perspective [55.95455089638838]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.<n>This paper proposes a kernel-regression perspective of understanding LLMs' ICL bahaviors when faced with in-context examples.<n>We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.