LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
- URL: http://arxiv.org/abs/2410.18050v2
- Date: Fri, 01 Nov 2024 15:36:59 GMT
- Title: LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
- Authors: Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang,
- Abstract summary: LongRAG is a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA.
LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%)
- Score: 27.114593394058144
- License:
- Abstract: Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the "lost in the middle" issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG's understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive experiments on three multi-hop datasets demonstrate that LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%). Furthermore, we conduct quantitative ablation studies and multi-dimensional analyses, highlighting the effectiveness of the system's components and fine-tuning strategies. Data and code are available at https://github.com/QingFei1/LongRAG.
Related papers
- Long Context RAG Performance of Large Language Models [29.7557824450885]
Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs)
This paper presents a study of the impact of increased context length on RAG performance across 20 popular open source and commercial LLMs.
arXiv Detail & Related papers (2024-11-05T22:37:43Z) - ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering [42.146660039671076]
We develop a retrieve-then-reason framework for large language models (LLMs)
We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts"
We introduce ALR$2$, a method that augments the long-context reasoning capability of LLMs via an explicit two-stage procedure.
arXiv Detail & Related papers (2024-10-04T08:29:12Z) - In Defense of RAG in the Era of Long-Context Language Models [17.397639724806364]
Retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past.
Recent studies show that long-context LLMs significantly outperform RAG in long-context applications.
We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications.
arXiv Detail & Related papers (2024-09-03T07:17:41Z) - Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach [26.02167477129771]
Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts.
We compare RAG and long-context (LC) LLMs, aiming to leverage the strengths of both.
We propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection.
arXiv Detail & Related papers (2024-07-23T20:51:52Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation.
Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge.
RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - LooGLE: Can Long-Context Language Models Understand Long Contexts? [46.143956498529796]
LooGLE is a benchmark for large language models' long context understanding.
It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains.
The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.