CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
- URL: http://arxiv.org/abs/2509.09199v1
- Date: Thu, 11 Sep 2025 07:13:49 GMT
- Title: CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
- Authors: Wenhao Li, Bangcheng Sun, Weihao Ye, Tianyi Zhang, Daohai Yu, Fei Chao, Rongrong Ji,
- Abstract summary: CCF is a novel context compression framework designed to enable efficient long-context modeling.<n>CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations.<n> Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios.
- Score: 52.05149789178508
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scaling language models to longer contexts is essential for capturing rich dependencies across extended discourse. However, na\"ive context extension imposes significant computational and memory burdens, often resulting in inefficiencies during both training and inference. In this work, we propose CCF, a novel context compression framework designed to enable efficient long-context modeling by learning hierarchical latent representations that preserve global semantics while aggressively reducing input redundancy. CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations that support accurate reconstruction and long-range understanding. To further enhance scalability, we introduce a training-efficient optimization strategy that couples incremental segment decoding with sparse reservoir sampling, substantially reducing memory overhead without degrading performance. Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios, and significantly improves throughput and memory efficiency compared to existing approaches. These findings highlight the potential of structured compression for scalable and effective long-context language modeling.
Related papers
- Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization [68.89915707647138]
Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains.<n>We propose textbfCoSMo (textbfSplit-textbfMerge textbfOptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume.
arXiv Detail & Related papers (2026-02-03T05:54:28Z) - Training-free Context-adaptive Attention for Efficient Long Context Modeling [57.703159205740185]
Training-free Context-adaptive Attention (TCA-Attention) is a training-free sparse attention mechanism that selectively attends to only the informative tokens for efficient long-context inference.<n>TCA-Attention achieves a 2.8$times$ speedup and reduces KV cache by 61% at 128K context length while maintaining performance comparable to full attention.
arXiv Detail & Related papers (2025-12-10T01:54:57Z) - Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z) - Scaling Linear Attention with Sparse State Expansion [58.161410995744596]
Transformer architecture struggles with long-context scenarios due to quadratic computation and linear memory growth.<n>We introduce a row-sparse update formulation for linear attention by conceptualizing state updating as information classification.<n>Second, we present Sparse State Expansion (SSE) within the sparse framework, which expands the contextual state into multiple partitions.
arXiv Detail & Related papers (2025-07-22T13:27:31Z) - Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z) - LeMoRe: Learn More Details for Lightweight Semantic Segmentation [48.81126061219231]
We introduce an efficient paradigm by synergizing explicit and implicit modeling to balance computational efficiency with representational fidelity.<n>Our method combines well-defined Cartesian directions with explicitly modeled views and implicitly inferred intermediate representations, efficiently capturing global dependencies.
arXiv Detail & Related papers (2025-05-29T04:55:10Z) - MambaIC: State Space Models for High-Performance Learned Image Compression [40.155314987485376]
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields.<n>Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods.<n>We propose an enhanced image compression approach through refined context modeling, which we term MambaIC.
arXiv Detail & Related papers (2025-03-16T11:32:34Z) - Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning [0.0]
Contextual Compression.<n>(CCE) introduced a multi-stage encoding mechanism that dynamically restructured parameter distributions.<n>CCE retained linguistic expressivity and coherence, maintaining accuracy across a range of text generation and classification tasks.
arXiv Detail & Related papers (2025-02-12T11:44:19Z) - Structured Token Retention and Computational Memory Paths in Large Language Models [0.0]
This paper introduces a probabilistic selection framework that dynamically adjusts token persistence based on contextual significance.<n>It is extended through hierarchical memory allocation, refining retention efficiency through structured reallocation of token embeddings.<n>The integration of STR and CMP into an open-source model illustrates the adaptability of structured memory retention methodologies.
arXiv Detail & Related papers (2025-02-05T11:59:22Z) - Context-Preserving Tensorial Reconfiguration in Large Language Model Training [0.0]
Context-Preservingial Reconfiguration (CPTR) enables dynamic complexity of weight tensors through structured factorization and adaptive contraction.<n> Empirical evaluations demonstrate that CPTR improves coherence retention across extended sequences.<n>Performance comparisons reveal that CPTR-enhanced models exhibit greater computational efficiency and reduced memory consumption.
arXiv Detail & Related papers (2025-02-01T00:55:19Z) - Framework for Progressive Knowledge Fusion in Large Language Models Through Structured Conceptual Redundancy Analysis [0.0]
The organization of latent knowledge within large-scale models poses unique challenges when addressing overlapping representations and optimizing contextual accuracy.<n>A framework was proposed to restructure these redundancies through advanced clustering techniques and dynamic thresholding.<n> Evaluations revealed improved memory efficiency and faster inference times, alongside better alignment in latent knowledge clusters that enhanced interpretability.
arXiv Detail & Related papers (2025-01-23T11:34:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.