Related papers: CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale

CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale

URL: http://arxiv.org/abs/2510.19340v2
Date: Thu, 23 Oct 2025 11:43:17 GMT
Title: CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale
Authors: L. Caspari, M. Dinzinger, K. Ghosh Dastidar, C. Fellicious, J. Mitrović, M. Granitzer,
Abstract summary: CoRECT is a framework for large-scale evaluation of embedding compression methods.<n>We show that non-learned compression achieves substantial index size reduction, even on up to 100M passages.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dense retrieval systems have proven to be effective across various benchmarks, but require substantial memory to store large search indices. Recent advances in embedding compression show that index sizes can be greatly reduced with minimal loss in ranking quality. However, existing studies often overlook the role of corpus complexity -- a critical factor, as recent work shows that both corpus size and document length strongly affect dense retrieval performance. In this paper, we introduce CoRECT (Controlled Retrieval Evaluation of Compression Techniques), a framework for large-scale evaluation of embedding compression methods, supported by a newly curated dataset collection. To demonstrate its utility, we benchmark eight representative types of compression methods. Notably, we show that non-learned compression achieves substantial index size reduction, even on up to 100M passages, with statistically insignificant performance loss. However, selecting the optimal compression method remains challenging, as performance varies across models. Such variability highlights the necessity of CoRECT to enable consistent comparison and informed selection of compression methods. All code, data, and results are available on GitHub and HuggingFace.

Related papers

Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z)
Arbitrary Ratio Feature Compression via Next Token Prediction [52.10426317889982]
Arbitrary Ratio Feature Compression (ARFC) framework supports any compression ratio with a single model.<n>ARC is an auto-regressive model that performs compression via next-gressive prediction.<n>MoS module refines the compressed tokens by utilizing multiple compression results.<n>ERGC is integrated into the training process to preserve semantic and structural relationships during compression.
arXiv Detail & Related papers (2026-02-12T02:38:57Z)
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods [54.4711434793961]
We show that simple image downsampling consistently outperforms many advanced compression methods across multiple widely used benchmarks.<n>Motivated by these findings, we introduce VTC-Bench, an evaluation framework that incorporates a data filtering mechanism to denoise existing benchmarks.
arXiv Detail & Related papers (2025-10-08T15:44:28Z)
Challenges and Solutions in Selecting Optimal Lossless Data Compression Algorithms [0.9883261192383612]
We present a framework that integrates compression ratio, encoding time, and decoding time into a unified performance score.<n>We show that it reliably identifies the most suitable compressor for different priority settings.<n>Results also reveal that while modern learning-based codecs often provide superior compression ratios, classical algorithms remain advantageous when speed is paramount.
arXiv Detail & Related papers (2025-09-23T22:30:55Z)
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning [22.93037884068796]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the timeliness of knowledge updates and the factual accuracy of responses in large language models.<n>Existing approaches to document compression tailored for RAG often degrade task performance.<n>We propose CORE, a novel method for lossless context compression in RAG.
arXiv Detail & Related papers (2025-08-24T12:21:50Z)
An Enhancement of Jiang, Z., et al.s Compression-Based Classification Algorithm Applied to News Article Categorization [0.0]
This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents.<n>The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression.<n> Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents.
arXiv Detail & Related papers (2025-02-20T10:50:59Z)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference [61.412894960600205]
Large Language Models (LLMs) require significant GPU memory when processing long texts.<n>ChunkKV reimagines KV cache compression by treating semantic chunks as basic compression units.<n>Result: ChunkKV outperforms state-of-the-art methods by up to 8.7% in precision.
arXiv Detail & Related papers (2025-02-01T03:49:47Z)
Lightweight Correlation-Aware Table Compression [58.50312417249682]
$texttVirtual$ is a framework that integrates seamlessly with existing open formats. Experiments on data-gov datasets show that $texttVirtual$ reduces file sizes by up to 40% compared to Apache Parquet.
arXiv Detail & Related papers (2024-10-17T22:28:07Z)
Characterizing Prompt Compression Methods for Long Context Inference [36.9745587176401]
Long context inference presents challenges at the system level with increased compute and memory requirements. Several methods have been proposed to compress the prompt to reduce the context length. We perform a comprehensive characterization and evaluation of different prompt compression methods.
arXiv Detail & Related papers (2024-07-11T23:34:32Z)
Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain. The codebook learned by the VQGAN model yields a strong expressive capacity. The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z)
Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets. We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.