Related papers: Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering

Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering

URL: http://arxiv.org/abs/2507.04069v1
Date: Sat, 05 Jul 2025 15:10:12 GMT
Title: Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering
Authors: Ting-Wen Ko, Jyun-Yu Jiang, Pu-Jen Cheng,
Abstract summary: Adaptive Passage Combination Retrieval (AdaPCR) is a novel framework for open-domain question answering with black-box LMs.<n>AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking.
Score: 7.468615741572889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external documents at inference time, enabling up-to-date knowledge access without costly retraining. However, conventional RAG methods retrieve passages independently, often leading to redundant, noisy, or insufficiently diverse context-particularly problematic - particularly problematic in noisy corpora and for multi-hop questions. To address this, we propose Adaptive Passage Combination Retrieval (AdaPCR), a novel framework for open-domain question answering with black-box LMs. AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking. It consists of a context-aware query reformulation using concatenated passages, and a reranking step trained with a predictive objective aligned with downstream answer likelihood. Crucially, AdaPCR adaptively selects the number of retrieved passages without additional stopping modules. Experiments across several QA benchmarks show that AdaPCR outperforms baselines, particularly in multi-hop reasoning, demonstrating the effectiveness of modeling inter-passage dependencies for improved retrieval.

Related papers

PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning [57.89188317734747]
PrismRAG trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages.<n>It instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions.
arXiv Detail & Related papers (2025-07-25T00:15:31Z)
Question Decomposition for Retrieval-Augmented Generation [2.6409776648054764]
We propose a RAG pipeline that incorporates question decomposition into sub-questions.<n>We show that question decomposition effectively assembles complementary documents, while reranking reduces noise.<n>Although reranking itself is standard, we show that pairing an off-the-shelf cross-encoder reranker with LLM-driven question decomposition bridges the retrieval gap on multi-hop questions.
arXiv Detail & Related papers (2025-07-01T01:01:54Z)
Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering [28.09833765246606]
Q-DREAM consists of three key modules: (1) the Question Decomposition Module (QDM), which decomposes interdependent subquestions; (2) the Subquestion Dependency Module (SDOM), which models the relations of subquestions for better understanding; and (3) the Dynamic Passage Retrieval Module (DPRM), which aligns subquestions with relevant passages by optimizing the semantic embeddings.<n> Experimental results across various benchmarks demonstrate that Q-DREAM significantly outperforms existing RAG methods, achieving state-of-the-art performance in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-05-31T09:57:07Z)
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning [45.37734114816888]
We present ConvSearch-R1, a framework that eliminates dependency on external rewrite supervision by leveraging reinforcement learning to optimize reformulation directly through retrieval signals.<n>Our novel two-stage approach combines Self-Driven Policy Warm-Up to address the cold-start problem through retrieval-guided self-distillation, followed by Retrieval-Guided Reinforcement Learning with a specially designed rank-incentive reward shaping mechanism that addresses the sparsity issue in conventional retrieval metrics.
arXiv Detail & Related papers (2025-05-21T17:27:42Z)
QPaug: Question and Passage Augmentation for Open-Domain Question Answering of LLMs [5.09189220106765]
We propose a simple yet efficient method called question and passage augmentation (QPaug) via large language models (LLMs) for open-domain question-answering tasks. Experimental results show that QPaug outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods.
arXiv Detail & Related papers (2024-06-20T12:59:27Z)
Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models [11.716595438057997]
We propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT) PSPT is a parameter-efficient method that fine-tunes learnable passage-specific soft prompts. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets.
arXiv Detail & Related papers (2024-05-31T07:43:42Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs [80.74263278847063]
The integration of retrieved passages and large language models (LLMs) has significantly contributed to improving open-domain question answering. This paper investigates different methods of combining retrieved passages with LLMs to enhance answer generation.
arXiv Detail & Related papers (2023-08-24T05:26:54Z)
Joint Passage Ranking for Diverse Multi-Answer Retrieval [56.43443577137929]
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a question. This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer. In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms.
arXiv Detail & Related papers (2021-04-17T04:48:36Z)
Answering Any-hop Open-domain Questions with Iterative Document Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions. Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.