ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs
- URL: http://arxiv.org/abs/2602.03226v1
- Date: Tue, 03 Feb 2026 07:53:29 GMT
- Title: ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs
- Authors: Xuancheng Li, Haitao Li, Yujia Zhou, Qingyao Ai, Yiqun Liu,
- Abstract summary: We propose Adaptive Task-Aware (ATACompressor), which adjusts compression based on the specific requirements of a task.<n>ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content.<n>We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance.
- Score: 28.55805086141996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by reducing input size, but existing approaches struggle with balancing information preservation and compression efficiency. We propose Adaptive Task-Aware Compressor (ATACompressor), which dynamically adjusts compression based on the specific requirements of the task. ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content. Its adaptive allocation controller perceives the length of relevant content and adjusts the compression rate accordingly, optimizing resource utilization. We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance. Our approach provides a scalable solution for long-context processing in LLMs. Furthermore, we perform a range of ablation studies and analysis experiments to gain deeper insights into the key components of ATACompressor.
Related papers
- Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation [49.48204107529758]
We define token overflow as a regime in which compressed representations no longer contain sufficient information to answer a given query.<n>In this paper, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations.<n>Lightweight probing classifiers over both query and context xRAG representations detect overflow with 0.72 AUC-ROC on average.<n>These results advance from query-independent diagnostics to query-aware detectors, enabling low-cost pre-LLM gating to mitigate compression-induced errors.
arXiv Detail & Related papers (2026-02-12T18:15:08Z) - Arbitrary Ratio Feature Compression via Next Token Prediction [52.10426317889982]
Arbitrary Ratio Feature Compression (ARFC) framework supports any compression ratio with a single model.<n>ARC is an auto-regressive model that performs compression via next-gressive prediction.<n>MoS module refines the compressed tokens by utilizing multiple compression results.<n>ERGC is integrated into the training process to preserve semantic and structural relationships during compression.
arXiv Detail & Related papers (2026-02-12T02:38:57Z) - Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective [21.41673002861847]
Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge.<n>Recent research on soft context compression aims to address this by encoding long documents into compact embeddings.<n>We introduce SeleCom, a selector-based soft compression framework for RAG that redefines the encoder's role as query-conditioned information selector.
arXiv Detail & Related papers (2026-01-25T09:06:24Z) - AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation [27.480791258325066]
We introduce AttnComp, an adaptive, efficient and context-aware compression framework.<n>AttnComp employs a Top-P compression algorithm to retain the minimal set of documents.<n>In addition to compression, AttnComp estimates response confidence by assessing the overall relevance of the retrieved content.
arXiv Detail & Related papers (2025-09-22T08:18:50Z) - UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression [86.33995240043936]
UniGist is a sequence-level long-context compression framework for large language models.<n>It efficiently preserves context information by replacing raw tokens with special compression tokens (gists) in a fine-grained manner.<n>Our scheme also supports flexible inference by allowing the actual removal of compressed tokens, resulting in real-time memory savings.
arXiv Detail & Related papers (2025-09-19T08:47:37Z) - CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning [22.93037884068796]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the timeliness of knowledge updates and the factual accuracy of responses in large language models.<n>Existing approaches to document compression tailored for RAG often degrade task performance.<n>We propose CORE, a novel method for lossless context compression in RAG.
arXiv Detail & Related papers (2025-08-24T12:21:50Z) - DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression [63.83422894663496]
We propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC)<n>This approach effectively integrates entropy and attention information, dynamically sensing entropy shifts during compression to achieve fine-grained prompt compression.<n>Extensive experiments across various domains, including LongBench, GSM8K, and BBH, show that DAC consistently yields robust and substantial improvements.
arXiv Detail & Related papers (2025-07-16T06:16:06Z) - MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores [5.893964327109089]
MOOSComp is a token-classification-based long-context compression method.<n>We introduce outlier scores to preserve rare but critical tokens that are prone to be discarded in task-agnostic compression.<n>Our method obtains a speedup of 3.3x at a 4x compression ratio on a resource-constrained mobile device.
arXiv Detail & Related papers (2025-04-23T15:02:53Z) - Understanding and Improving Information Preservation in Prompt Compression for LLMs [15.797246416590339]
In information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information.<n>We propose a holistic evaluation framework that allows for in-depth analysis of prompt compression methods.
arXiv Detail & Related papers (2025-03-24T20:06:11Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models.
Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z) - In-Context Former: Lightning-fast Compressing Context for Large Language Model [48.831304302467004]
In this paper, we propose a new approach to compress the long input contexts of Transformer-based large language models (LLMs)
We use the cross-attention mechanism and a small number of learnable digest tokens to condense information from the contextual word embeddings.
Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times.
arXiv Detail & Related papers (2024-06-19T15:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.