ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG
- URL: http://arxiv.org/abs/2601.05038v1
- Date: Thu, 08 Jan 2026 15:44:52 GMT
- Title: ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG
- Authors: Jianbo Li, Yi Jiang, Sendong Zhao, Bairui Hu, Haochun Wang, Bing Qin,
- Abstract summary: ArcAligner is a lightweight module integrated into the language model layers.<n>It uses an adaptive ''gating'' system that only adds extra processing power when the information is complex.
- Score: 46.14646374046088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) helps LLMs stay accurate, but feeding long documents into a prompt makes the model slow and expensive. This has motivated context compression, ranging from token pruning and summarization to embedding-based compression. While researchers have tried ''compressing'' these documents into smaller summaries or mathematical embeddings, there is a catch: the more you compress the data, the more the LLM struggles to understand it. To address this challenge, we propose ArcAligner (Adaptive recursive context *Aligner*), a lightweight module integrated into the language model layers to help the model better utilize highly compressed context representations for downstream generation. It uses an adaptive ''gating'' system that only adds extra processing power when the information is complex, keeping the system fast. Across knowledge-intensive QA benchmarks, ArcAligner consistently beats compression baselines at comparable compression rates, especially on multi-hop and long-tail settings. The source code is publicly available.
Related papers
- Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z) - Revisiting Data Compression with Language Modeling [0.0]
We investigate the potential use of large language models (LLM's) in the task of data compression.<n>We achieve a new state-of-the-art (SOTA) adjusted compression rate of around $18%$ on the enwik9 dataset.<n>We show that while LLM's excel in compressing data in text-dominant domains, their ability in compressing non-natural text sequences still remain competitive if configured in the right way.
arXiv Detail & Related papers (2026-01-06T10:03:33Z) - BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning [86.4235795435618]
BRIEF-Pro is a lightweight compressor that distills relevant evidence for a given query from retrieved documents into a concise summary.<n>It is trained to perform abstractive compression of extended contexts exceeding 10k words across a wide range of scenarios.<n> Experiments show that BRIEF-Pro generates more concise and relevant summaries, enhancing performance across small, large, and proprietary language models.
arXiv Detail & Related papers (2025-10-15T17:57:45Z) - AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation [27.480791258325066]
We introduce AttnComp, an adaptive, efficient and context-aware compression framework.<n>AttnComp employs a Top-P compression algorithm to retain the minimal set of documents.<n>In addition to compression, AttnComp estimates response confidence by assessing the overall relevance of the retrieved content.
arXiv Detail & Related papers (2025-09-22T08:18:50Z) - Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective [29.50363211934763]
Retrieval-augmented generation (RAG) enhances large language models with external context, but retrieved passages are often lengthy, noisy, or exceed input limits.<n>We propose Sentinel, a lightweight sentence-level compression framework that reframes context filtering as an attention-based understanding task.
arXiv Detail & Related papers (2025-05-29T09:24:12Z) - ICPC: In-context Prompt Compression with Faster Inference [0.0]
We propose I CPC (In-context Prompt Compression), a novel and scalable prompt compression method that adaptively reduces the prompt length.<n>The key idea of I CPC is to calculate the probability of each word appearing in the prompt using encoders and calculate information carried by each word through the information function.<n> Empirically, we demonstrate that I CPC can effectively compress long texts of different categories and thus achieve better performance and speed on different types of NLP tasks.
arXiv Detail & Related papers (2025-01-03T03:46:51Z) - Efficient Long Context Language Model Retrieval with Compression [57.09163579304332]
Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR)<n>We propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages.<n>We show that CoLoR improves the retrieval performance by 6% while compressing the in-context size by a factor of 1.91.
arXiv Detail & Related papers (2024-12-24T07:30:55Z) - BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge.<n>This paper presents BRIEF, a lightweight approach that performs query-aware multi-hop reasoning.<n>Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z) - AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models [15.887617654762629]
Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive.
We introduce AdaComp, a low-cost extractive context compression method that adaptively determines the compression rate based on both query complexity and retrieval quality.
arXiv Detail & Related papers (2024-09-03T03:25:59Z) - Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models.
Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.