Context Cascade Compression: Exploring the Upper Limits of Text Compression
- URL: http://arxiv.org/abs/2511.15244v1
- Date: Wed, 19 Nov 2025 09:02:56 GMT
- Title: Context Cascade Compression: Exploring the Upper Limits of Text Compression
- Authors: Fanfan Liu, Haibo Qiu,
- Abstract summary: We introduce Context Cascade Compression C3 to explore the upper limits of text compression.<n>At a 20x compression ratio, our model achieves 98% decoding accuracy, compared to approximately 60% for DeepSeek-OCR.<n>This indicates that in the domain of context compression, C3 Compression demonstrates superior performance and feasibility over optical character compression.
- Score: 3.013064618174921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Million-level token inputs in long-context tasks pose significant computational and memory challenges for Large Language Models (LLMs). Recently, DeepSeek-OCR conducted research into the feasibility of Contexts Optical Compression and achieved preliminary results. Inspired by this, we introduce Context Cascade Compression C3 to explore the upper limits of text compression. Our method cascades two LLMs of different sizes to handle the compression and decoding tasks. Specifically, a small LLM, acting as the first stage, performs text compression by condensing a long context into a set of latent tokens (e.g., 32 or 64 in length), achieving a high ratio of text tokens to latent tokens. A large LLM, as the second stage, then executes the decoding task on this compressed context. Experiments show that at a 20x compression ratio (where the number of text tokens is 20 times the number of latent tokens), our model achieves 98% decoding accuracy, compared to approximately 60% for DeepSeek-OCR. When we further increase the compression ratio to 40x, the accuracy is maintained at around 93%. This indicates that in the domain of context compression, C3 Compression demonstrates superior performance and feasibility over optical character compression. C3 uses a simpler, pure-text pipeline that ignores factors like layout, color, and information loss from a visual encoder. This also suggests a potential upper bound for compression ratios in future work on optical character compression, OCR, and related fields. Codes and model weights are publicly accessible at https://github.com/liufanfanlff/C3-Context-Cascade-Compression
Related papers
- Glyph: Scaling Context Windows via Visual-Text Compression [91.20717058018745]
Glyph is a framework that renders long texts into images and processes them with vision-language models.<n>Our method achieves 3-4x token compression while maintaining accuracy comparable to leading long-context models.<n>Under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks.
arXiv Detail & Related papers (2025-10-20T17:58:56Z) - Compressing Many-Shots in In-Context Learning [61.231471139896506]
We study an approach to improve the memory and computational efficiency of ICL inference by compressing the many-shot prompts.<n>We first show that existing prompt compression methods are ineffective for many-shot compression.<n>We propose MemCom, a layer-wise compression method.
arXiv Detail & Related papers (2025-10-17T16:57:42Z) - Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors [43.02557489472655]
Current context compression methods rely on autoencoding tasks to train context-agnostic compression tokens to compress contextual semantics.<n>We propose Semantic-Anchor Compression (SAC), a novel method that shifts from autoencoding task based compression to an architecture that is equipped with this compression capability.<n>SAC consistently outperforms existing context compression methods across various compression ratios.
arXiv Detail & Related papers (2025-10-10T01:42:14Z) - CompLLM: Compression for Long Context Q&A [47.90063873976842]
We introduce CompLLM, a soft compression technique designed for practical deployment.<n>Instead of processing the context holistically, CompLLM divides it into segments and compresses each one independently.<n>Our experiments show that with a 2x compression rate, at high context lengths CompLLM speeds up Time To First Token (TTFT) by up to 4x and reduces the KV cache size by 50%.
arXiv Detail & Related papers (2025-09-23T16:49:43Z) - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search [61.4807238517108]
Chain-of-Thought (CoT) reasoning enhances large language models (LLMs) by enabling step-by-step problem-solving.<n>CoT's extension to Long-CoT introduces substantial computational overhead due to increased token length.<n>We propose R1-Compress, a two-stage chunk-level compression framework that preserves both local information and coherence.
arXiv Detail & Related papers (2025-05-22T16:06:59Z) - Vision-centric Token Compression in Large Language Model [51.92055188780033]
Vision Centric Token Compression (Vist) is a slow-fast compression framework that mirrors human reading.<n>On eleven in-context learning benchmarks, Vist achieves the same accuracy with 2.3 times fewer tokens, cutting FLOPs by 16% and memory by 50%.
arXiv Detail & Related papers (2025-02-02T13:10:06Z) - L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression [23.179381396167084]
We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC)<n> RWKV models achieve the fastest decoding speed with a moderate compression ratio.<n>We propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens.
arXiv Detail & Related papers (2024-12-21T14:24:32Z) - Training LLMs over Neurally Compressed Text [55.11828645767342]
This paper explores the idea of training large language models (LLMs) over highly compressed text.<n>We propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length.<n>We demonstrate effective learning over neurally compressed text that improves with scale, and outperforms byte-level baselines by a wide margin on perplexity and inference speed benchmarks.
arXiv Detail & Related papers (2024-04-04T17:48:28Z) - Long Context Compression with Activation Beacon [22.054232261437186]
Activation Beacon is a plug-in module for transformer-based LLMs.
It targets effective, efficient, and flexible compression of long contexts.
It achieves a 2x acceleration in inference time and an 8x reduction of memory costs for KV cache.
arXiv Detail & Related papers (2024-01-07T11:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.