PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
- URL: http://arxiv.org/abs/2504.18070v1
- Date: Fri, 25 Apr 2025 04:47:34 GMT
- Title: PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
- Authors: Jingjin Wang,
- Abstract summary: PropRAG is a framework leveraging contextually rich propositions and a novel beam search algorithm over proposition paths.<n>PropRAG's online retrieval process operates entirely without invoking generative Large Language Models.<n>PropRAG achieves state-of-the-art zero-shot Recall@5 results on PopQA (55.3%), 2Wiki (93.7%), HotpotQA (97.0%), and MuSiQue (77.3%), alongside top F1 scores (e.g., 52.4% on MuSiQue)
- Score: 2.548569570955189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval Augmented Generation (RAG) has become the standard non-parametric approach for equipping Large Language Models (LLMs) with up-to-date knowledge and mitigating catastrophic forgetting common in continual learning. However, standard RAG, relying on independent passage retrieval, fails to capture the interconnected nature of human memory crucial for complex reasoning (associativity) and contextual understanding (sense-making). While structured RAG methods like HippoRAG utilize knowledge graphs (KGs) built from triples, the inherent context loss limits fidelity. We introduce PropRAG, a framework leveraging contextually rich propositions and a novel beam search algorithm over proposition paths to explicitly discover multi-step reasoning chains. Crucially, PropRAG's online retrieval process operates entirely without invoking generative LLMs, relying instead on efficient graph traversal and pre-computed embeddings. This avoids online LLM inference costs and potential inconsistencies during evidence gathering. LLMs are used effectively offline for high-quality proposition extraction and post-retrieval for answer generation. PropRAG achieves state-of-the-art zero-shot Recall@5 results on PopQA (55.3%), 2Wiki (93.7%), HotpotQA (97.0%), and MuSiQue (77.3%), alongside top F1 scores (e.g., 52.4% on MuSiQue). By improving evidence retrieval through richer representation and explicit, LLM-free online path finding, PropRAG advances non-parametric continual learning.
Related papers
- Beyond Chunks and Graphs: Retrieval-Augmented Generation through Triplet-Driven Thinking [31.73448933991891]
Retrieval-augmented generation (RAG) is critical for reducing hallucinations and incorporating external knowledge into Large Language Models (LLMs)<n>We propose T$2$RAG, a novel framework that operates on a simple, graph-free knowledge base of atomic triplets.<n> Empirical results show that T$2$RAG significantly outperforms state-of-the-art multi-round and Graph RAG methods.
arXiv Detail & Related papers (2025-08-04T13:50:44Z) - PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning [57.89188317734747]
PrismRAG trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages.<n>It instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions.
arXiv Detail & Related papers (2025-07-25T00:15:31Z) - Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval [7.542076325904203]
Retrieval-Augmented Generation (RAG) addresses the limitation by incorporating external information, often from graph-structured data.<n>We propose Clue-RAG, a novel approach that introduces a multi-partite graph index and a query-driven iterative retrieval strategy.<n>Experiments on three QA benchmarks show that Clue-RAG significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-11T09:36:45Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning [62.742230250513025]
Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and hallucination.<n>We propose $textbfR3-RAG$, which uses $textbfR$einforcement learning to make the LLM learn how to $textbfR$eason and $textbfR$etrieve step by step, thus retrieving comprehensive external knowledge and leading to correct answers.
arXiv Detail & Related papers (2025-05-26T12:25:37Z) - AlignRAG: An Adaptable Framework for Resolving Misalignments in Retrieval-Aware Reasoning of RAG [61.28113271728859]
Retrieval-augmented generation (RAG) has emerged as a foundational paradigm for knowledge-grounded text generation.<n>Existing RAG pipelines often fail to ensure that the reasoning trajectories align with the evidential constraints imposed by retrieved content.<n>We propose AlignRAG, a novel test-time framework that mitigates reasoning misalignment through iterative Critique-Driven Alignment steps.
arXiv Detail & Related papers (2025-04-21T04:56:47Z) - Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization [97.72503890388866]
We propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization.<n>SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge.<n>We introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision.
arXiv Detail & Related papers (2025-04-01T17:59:30Z) - HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation [28.69822159828129]
We propose HopRAG, a novel RAG framework that augments retrieval with logical reasoning.<n>During indexing, HopRAG constructs a passage graph, with text chunks as vertices and logical connections established via LLM-generated pseudo-queries as edges.<n>During retrieval, it employs a retrieve-reason-prune mechanism, starting with lexically or semantically similar passages.<n>Experiments demonstrate HopRAG's superiority, achieving 76.78% higher answer accuracy and 65.07% improved retrieval F1 score compared to conventional methods.
arXiv Detail & Related papers (2025-02-18T02:24:42Z) - LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing [70.35888047551643]
We present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs.<n>LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts.<n>We find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks.
arXiv Detail & Related papers (2025-02-14T08:04:22Z) - DeepRAG: Thinking to Retrieval Step by Step for Large Language Models [92.87532210660456]
We propose DeepRAG, a framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP)<n>By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step.<n> Experiments show that DeepRAG improves retrieval efficiency while improving answer accuracy by 21.99%, demonstrating its effectiveness in optimizing retrieval-augmented reasoning.
arXiv Detail & Related papers (2025-02-03T08:22:45Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
We propose W-RAG, a method that draws weak training signals from the downstream task and fine-tunes the retriever to prioritize passages that most benefit the task.
We conduct comprehensive experiments across four publicly available OpenQA datasets to demonstrate that our approach enhances both retrieval and OpenQA performance.
arXiv Detail & Related papers (2024-08-15T22:34:44Z) - Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models [9.688626139309013]
Retrieval-Augmented Generation is considered as a means to improve the trustworthiness of text generation from large language models.
In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers.
We introduce a novel optimization technique called Gradient Guided Prompt Perturbation.
arXiv Detail & Related papers (2024-02-11T12:25:41Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.