Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning
- URL: http://arxiv.org/abs/2506.13778v1
- Date: Mon, 09 Jun 2025 16:15:11 GMT
- Title: Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning
- Authors: Anvi Alex Eponon, Moein Shahiki-Tash, Ildar Batyrshin, Christian E. Maldonado-Sifuentes, Grigori Sidorov, Alexander Gelbukh,
- Abstract summary: This study presents a question-based knowledge encoding approach that improves retrieval-augmented generation (RAG) systems without requiring fine-tuning or traditional chunking.<n>We encode textual content using generated questions that span the lexical and semantic space, creating targeted retrieval cues combined with a custom syntactic reranking method.<n>In single-hop retrieval over 109 scientific papers, our approach achieves a Recall@3 of 0.84, outperforming traditional chunking methods by 60 percent.
- Score: 42.35305639777465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study presents a question-based knowledge encoding approach that improves retrieval-augmented generation (RAG) systems without requiring fine-tuning or traditional chunking. We encode textual content using generated questions that span the lexical and semantic space, creating targeted retrieval cues combined with a custom syntactic reranking method. In single-hop retrieval over 109 scientific papers, our approach achieves a Recall@3 of 0.84, outperforming traditional chunking methods by 60 percent. We also introduce "paper-cards", concise paper summaries under 300 characters, which enhance BM25 retrieval, increasing MRR@3 from 0.56 to 0.85 on simplified technical queries. For multihop tasks, our reranking method reaches an F1 score of 0.52 with LLaMA2-Chat-7B on the LongBench 2WikiMultihopQA dataset, surpassing chunking and fine-tuned baselines which score 0.328 and 0.412 respectively. This method eliminates fine-tuning requirements, reduces retrieval latency, enables intuitive question-driven knowledge access, and decreases vector storage demands by 80%, positioning it as a scalable and efficient RAG alternative.
Related papers
- Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval [3.976291254896486]
We propose using a single call to a generic 8B- parameter LLM for query reformulation.<n>Our method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall.<n> Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%.
arXiv Detail & Related papers (2026-02-10T21:59:10Z) - Cross-Document Topic-Aligned Chunking for Retrieval-Augmented Generation [0.0]
Cross-Document Topic-Aligned chunking reconstructs knowledge at the corpus level.<n>It first identifies topics across documents, maps segments to each topic, and synthesizes them into unified chunks.
arXiv Detail & Related papers (2025-11-08T11:45:45Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - MLP Memory: A Retriever-Pretrained Memory for Large Language Models [39.890098399493986]
NLPRAG Memory is a lightweight parametric module that learns to internalize retrieval patterns without explicit document access.<n>Our architecture integrates this pretrained Memory with Transformer decoders through simple probability, yielding 17.5% and 24.1% scaling gains on WikiText-103 and Web datasets, respectively.
arXiv Detail & Related papers (2025-08-03T16:40:53Z) - Hierarchical Patch Compression for ColPali: Efficient Multi-Vector Document Retrieval with Dynamic Pruning and Quantization [0.0]
Multi-vector document retrieval systems, such as ColPali, excel in fine-grained matching for complex queries but incur significant storage and computational costs.<n>We propose HPC-ColPali, a grained Patch Compression framework that enhances the efficiency of ColPali while preserving its retrieval accuracy.<n>Our approach integrates three innovative techniques: (1) K-Means quantization, which compresses patch embeddings into 1-byte centroid indices, achieving up to 32$times$ storage reduction; (2) attention-guided dynamic pruning, utilizing Vision-Language Model attention weights to retain only the top-$p%$ most
arXiv Detail & Related papers (2025-06-19T08:45:52Z) - TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering [27.37434534716611]
TreeHop is an embedding-level framework for multi-hop question answering.<n>TreeHop dynamically updates query embeddings by fusing semantic information from prior queries.<n>TreeHop is a faster and more cost-effective solution for deployment in a range of knowledge-intensive applications.
arXiv Detail & Related papers (2025-04-28T01:56:31Z) - Efficient Long Context Language Model Retrieval with Compression [57.09163579304332]
Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR)<n>We propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages.<n>We show that CoLoR improves the retrieval performance by 6% while compressing the in-context size by a factor of 1.91.
arXiv Detail & Related papers (2024-12-24T07:30:55Z) - AUEB-Archimedes at RIRAG-2025: Is obligation concatenation really all you need? [11.172264842171682]
This paper presents the systems we developed for RIRAG-2025, a shared task that requires answering regulatory questions by retrieving relevant passages.<n>The generated answers are evaluated using RePASs, a reference-free and model-based metric.<n>We show that by exploiting a neural component of RePASs that extracts important sentences ('obligations') from the retrieved passages, we achieve a dubiously high score (0.947)<n>We then show that by selecting the answer with the best RePASs among a few generated alternatives, we can generate readable, coherent answers that achieve a more plausible and relatively high
arXiv Detail & Related papers (2024-12-16T08:54:21Z) - BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge.<n>This paper presents BRIEF, a lightweight approach that performs query-aware multi-hop reasoning.<n>Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z) - xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token [108.7069350303884]
xRAG is an innovative context compression method tailored for retrieval-augmented generation.<n>xRAG seamlessly integrates document embeddings into the language model representation space.<n> Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks.
arXiv Detail & Related papers (2024-05-22T16:15:17Z) - Semi-Parametric Retrieval via Binary Bag-of-Tokens Index [71.78109794895065]
SemI-parametric Disentangled Retrieval (SiDR) is a bi-encoder retrieval framework that decouples retrieval index from neural parameters.<n>SiDR supports a non-parametric tokenization index for search, achieving BM25-like indexing complexity with significantly better effectiveness.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Differentiable Reasoning over a Virtual Knowledge Base [156.94984221342716]
We consider the task of answering complex multi-hop questions using a corpus as a virtual knowledge base (KB)
In particular, we describe a neural module, DrKIT, that traverses textual data like a KB, softly following paths of relations between mentions of entities in the corpus.
DrKIT is very efficient, processing 10-100x more queries per second than existing multi-hop systems.
arXiv Detail & Related papers (2020-02-25T03:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.