Parallel Key-Value Cache Fusion for Position Invariant RAG
- URL: http://arxiv.org/abs/2501.07523v2
- Date: Thu, 23 Jan 2025 06:48:22 GMT
- Title: Parallel Key-Value Cache Fusion for Position Invariant RAG
- Authors: Philhoon Oh, Jinwoo Shin, James Thorne,
- Abstract summary: Large Language Models (LLMs) are sensitive to the position of relevant information within contexts.
We introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order.
- Score: 55.9809686190244
- License:
- Abstract: Recent advancements in Large Language Models (LLMs) underscore the necessity of Retrieval Augmented Generation (RAG) to leverage external information. However, LLMs are sensitive to the position of relevant information within contexts and tend to generate incorrect responses when such information is placed in the middle, known as `Lost in the Middle' phenomenon. In this paper, we introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order. Experimental results for three open domain question answering tasks demonstrate position invariance, where the model is not sensitive to input context order, and superior robustness to irrelevent passages compared to prevailing approaches for RAG pipelines.
Related papers
- Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding [58.364933651703524]
We show that concentrated massive values consistently emerge in specific regions of attention queries.
These massive values play a critical role in interpreting contextual knowledge.
We trace the emergence of massive values and find that such concentration is caused by Rotary Positional.
arXiv Detail & Related papers (2025-02-03T17:47:03Z) - Context Awareness Gate For Retrieval Augmented Generation [2.749898166276854]
Retrieval Augmented Generation (RAG) has emerged as a widely adopted approach to mitigate the limitations of large language models (LLMs) in answering domain-specific questions.
Previous research has predominantly focused on improving the accuracy and quality of retrieved data chunks to enhance the overall performance of the generation pipeline.
We investigate the impact of retrieving irrelevant information in open-domain question answering, highlighting its significant detrimental effect on the quality of LLM outputs.
arXiv Detail & Related papers (2024-11-25T06:48:38Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning [5.053086684547045]
This study introduces an in-context learning-based approach to enhance the reasoning capabilities of RALMs.
Our approach increases accuracy in identifying unanswerable and conflicting scenarios without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-08-08T12:42:43Z) - Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation [8.975024781390077]
We present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in question answering applications.
We evaluate our proposed approach on a multilingual QA dataset, finding high agreement with human answer attribution.
arXiv Detail & Related papers (2024-06-19T16:10:26Z) - RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation [42.82192656794179]
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses.
This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios.
Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process.
arXiv Detail & Related papers (2024-03-31T08:58:54Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.