Related papers: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

URL: http://arxiv.org/abs/2511.18659v2
Date: Tue, 25 Nov 2025 22:02:20 GMT
Title: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
Authors: Jie He, Richard He Bai, Sinead Williamson, Jeff Z. Pan, Navdeep Jaitly, Yizhe Zhang,
Abstract summary: CLaRa is a unified framework that performs embedding-based compression and joint optimization in a shared continuous space.<n> Experiments show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.
Score: 34.38636514331703
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.

Related papers

Latent Context Compilation: Distilling Long Context into Compact Portable Memory [13.768393657432027]
We propose Latent Context Compilation, a framework that shifts context processing from adaptation to compilation.<n>By utilizing a disposable LoRA module as a compiler, we distill long contexts into compact buffer tokens.<n> Experiments with Llama-3.1-8B demonstrate that Latent Context Compilation preserves fine-grained details and reasoning capabilities.
arXiv Detail & Related papers (2026-01-31T08:38:07Z)
GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion [32.17127975368661]
Repository-level code completion remains challenging for large language models.<n>We investigate lightweight, index-free, intent-aware lexical retrieval.<n>We introduce Naive GrepRAG, a baseline framework in which LLMs autonomously generate ripweighted commands to retrieve relevant context.
arXiv Detail & Related papers (2026-01-30T18:22:15Z)
Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z)
RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse [39.76548092849437]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context.<n>Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality.<n>We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse.
arXiv Detail & Related papers (2025-11-05T13:59:01Z)
CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling [52.05149789178508]
CCF is a novel context compression framework designed to enable efficient long-context modeling.<n>CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations.<n> Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios.
arXiv Detail & Related papers (2025-09-11T07:13:49Z)
Retrieval-augmented reasoning with lean language models [5.615564811138556]
We develop a retrieval augmented conversational agent capable of interpreting complex, domain-specific queries.<n>Our system integrates a dense retriever with fine-tuned Qwen2.5-Instruct models.<n>All implementation details and code are publicly released to support and adaptation across domains.
arXiv Detail & Related papers (2025-08-15T10:38:15Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks [11.053340674721005]
Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources.<n>This paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses real-time retrieval.
arXiv Detail & Related papers (2024-12-20T06:58:32Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token [108.7069350303884]
xRAG is an innovative context compression method tailored for retrieval-augmented generation.<n>xRAG seamlessly integrates document embeddings into the language model representation space.<n> Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks.
arXiv Detail & Related papers (2024-05-22T16:15:17Z)
Multiscale Latent-Guided Entropy Model for LiDAR Point Cloud Compression [18.897023700334458]
The non-uniform distribution and extremely sparse nature of the LiDAR point cloud (LPC) bring significant challenges to its high-efficient compression. This paper proposes a novel end-to-end, fully-factorized deep framework that encodes the original LPC into an octree structure and hierarchically decomposes the octree entropy model in layers.
arXiv Detail & Related papers (2022-09-26T08:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.