ARC-Encoder: learning compressed text representations for large language models
- URL: http://arxiv.org/abs/2510.20535v1
- Date: Thu, 23 Oct 2025 13:20:57 GMT
- Title: ARC-Encoder: learning compressed text representations for large language models
- Authors: Hippolyte Pilchen, Edouard Grave, Patrick Pérez,
- Abstract summary: ARC-Encoder is an encoder that compresses the context into continuous representations.<n>We show that ARC-Encoder achieves state-of-the-art performance on several benchmarks.
- Score: 24.079338539315398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs $x$-times fewer continuous representations (typically $x\!\in\!\{4,8\}$) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our models can be adapted to multiple decoders simultaneously, allowing a single encoder to generalize across different decoder LLMs. This makes ARC-Encoder a flexible and efficient solution for portable encoders that work seamlessly with multiple LLMs. We release a training code at https://github.com/kyutai-labs/ARC-Encoder , fine-tuning dataset and pretrained models are available at https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 .
Related papers
- Decoupling Vision and Language: Codebook Anchored Visual Adaptation [20.393987361723724]
Large Vision-Language Models (LVLMs) use their vision encoders to translate images into representations for downstream reasoning.<n>Existing adaptation methods modify the continuous feature interface between encoder and language model through projector tuning or other parameter-efficient updates.<n>We introduce CRAFT, a lightweight method that fine-tunes the encoder using a discrete codebook that anchors visual representations to a stable token space.
arXiv Detail & Related papers (2026-02-23T02:39:26Z) - CoPE-VideoLM: Codec Primitives For Efficient Video Language Models [56.76440182038839]
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos.<n>Current methods use sampling which can miss both macro-level events and micro-level details due to the sparse temporal coverage.<n>We propose to leverage video primitives which encode video redundancy and sparsity without requiring expensive full-image encoding for most frames.
arXiv Detail & Related papers (2026-02-13T18:57:31Z) - Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model [30.945523139748634]
We revisit encoder-decoder LLM (RedLLM), enhancing it with recent recipes from decoder-only LLM (DecLLM)<n>We conduct a comprehensive comparison between RedLLM, pretrained with prefix language modeling (LM), and DecLLM, pretrained with causal LM, at different model scales.<n>Using RedPajama V1 (1.6T tokens) for pretraining and FLAN for instruction tuning, our experiments show that RedLLM produces compelling scaling properties and surprisingly strong performance.
arXiv Detail & Related papers (2025-10-30T15:48:28Z) - METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models [92.37117312251755]
We propose a progressive pruning framework, namely Multi-Encoder collaboraTivE tOken pRuning (METEOR)<n>For multi-vision encoding, we discard redundant tokens within each encoder via a rank guided collaborative token assignment strategy.<n>For multi-vision fusion, we combine the visual features from different encoders while reducing cross-encoder redundancy with cooperative pruning.
arXiv Detail & Related papers (2025-07-28T13:50:53Z) - Leveraging Decoder Architectures for Learned Sparse Retrieval [26.483483554222012]
Learned Sparse Retrieval (LSR) has traditionally focused on small-scale encoder-only transformer architectures.<n>This study investigates the effectiveness of LSR across different transformer-based architectures.
arXiv Detail & Related papers (2025-04-25T08:04:52Z) - Return of the Encoder: Maximizing Parameter Efficiency for SLMs [4.246337121596753]
encoder-decoder architectures achieve 47% lower first-token latency and 4.7x higher throughput compared to decoder-only models on edge devices.<n>We introduce a novel knowledge distillation framework that enables encoder-decoder models to leverage capabilities from large scalable decoder-only teachers.
arXiv Detail & Related papers (2025-01-27T18:06:36Z) - Better Prompt Compression Without Multi-Layer Perceptrons [33.53334153279698]
We show that the encoder does not need to keep the original language model's architecture to achieve useful compression.<n>We introduce a prompt compression encoder after removing the multilayer perceptron (MLP) layers in the Transformer blocks of a language model.
arXiv Detail & Related papers (2025-01-12T06:57:06Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Extreme Encoder Output Frame Rate Reduction: Improving Computational
Latencies of Large End-to-End Models [59.57732929473519]
We apply multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
We demonstrate that we can generate one encoder output frame for every 2.56 sec of input speech, without significantly affecting word error rate on a large-scale voice search task.
arXiv Detail & Related papers (2024-02-27T03:40:44Z) - Triple-Encoders: Representations That Fire Together, Wire Together [51.15206713482718]
Contrastive Learning is a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder.
This study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances.
We find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models.
arXiv Detail & Related papers (2024-02-19T18:06:02Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Non-autoregressive End-to-end Speech Translation with Parallel
Autoregressive Rescoring [83.32560748324667]
This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models.
We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
arXiv Detail & Related papers (2021-09-09T16:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.