Related papers: Passage Segmentation of Documents for Extractive Question Answering

Related papers

RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning [69.87510139069218]
Retrieval-Augmented Generation (RAG) integrates non-parametric knowledge into Large Language Models (LLMs)<n>Recent progress has advanced text-based RAG to multi-turn reasoning through Reinforcement Learning (RL)<n>We introduce model, an RL-based framework that enables LLMs to perform multi-turn and adaptive graph-text hybrid RAG.
arXiv Detail & Related papers (2025-12-10T10:05:31Z)
Relevance to Utility: Process-Supervised Rewrite for RAG [38.81331265140413]
We show how "bridge" modules fail to capture true document utility.<n>We propose R2U, with a key distinction of directly optimizing to maximize the probability of generating a correct answer.
arXiv Detail & Related papers (2025-09-19T04:24:57Z)
HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking [23.982187522815536]
HiChunk is a multi-level document structuring framework based on fine-tuned LLMs and the Auto-Merge retrieval algorithm.<n>HiChunk achieves better chunking quality within reasonable time consumption, thereby enhancing the overall performance of RAG systems.
arXiv Detail & Related papers (2025-09-15T03:32:50Z)
REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z)
On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents [2.186901738997927]
We reproduce and examine the mechanisms of adapting Learned Sparse Retrieval (LSR) for long documents. Our experiments confirmed the importance of specific segments, with the first segment consistently dominating document retrieval performance. We re-evaluated recently proposed methods -- ExactSDM and SoftSDM -- across varying document lengths.
arXiv Detail & Related papers (2025-03-31T08:19:31Z)
Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning. We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z)
Knowledge Graph-Guided Retrieval Augmented Generation [34.83235788116369]
We propose a Knowledge Graph-Guided Retrieval Augmented Generation framework. KG$2$RAG provides fact-level relationships between chunks, improving the diversity and coherence of the retrieved results.
arXiv Detail & Related papers (2025-02-08T02:14:31Z)
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [88.55095746156428]
Retrieval-augmented generation (RAG) is widely utilized to incorporate external knowledge into large language models.<n>A standard RAG pipeline consists of several components, such as query rewriting, document retrieval, document filtering, and answer generation.<n>We propose treating the complex RAG pipeline with multiple components as a multi-agent cooperative task, in which each component can be regarded as an RL agent.
arXiv Detail & Related papers (2025-01-25T14:24:50Z)
Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs) Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z)
Is Semantic Chunking Worth the Computational Cost? [0.0]
This study systematically evaluates the effectiveness of semantic chunking using three common retrieval-related tasks. The results show that the computational costs associated with semantic chunking are not justified by consistent performance gains.
arXiv Detail & Related papers (2024-10-16T21:53:48Z)
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z)
MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation [60.04380907045708]
Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. We propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG achieves superior performances across a variety of long-context evaluation tasks.
arXiv Detail & Related papers (2024-09-09T13:20:31Z)
QPaug: Question and Passage Augmentation for Open-Domain Question Answering of LLMs [5.09189220106765]
We propose a simple yet efficient method called question and passage augmentation (QPaug) via large language models (LLMs) for open-domain question-answering tasks. Experimental results show that QPaug outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods.
arXiv Detail & Related papers (2024-06-20T12:59:27Z)
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z)
Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z)
Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs [80.74263278847063]
The integration of retrieved passages and large language models (LLMs) has significantly contributed to improving open-domain question answering. This paper investigates different methods of combining retrieved passages with LLMs to enhance answer generation.
arXiv Detail & Related papers (2023-08-24T05:26:54Z)
Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection. Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z)
Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions. We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy. GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.