Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
- URL: http://arxiv.org/abs/2410.05983v1
- Date: Tue, 8 Oct 2024 12:30:07 GMT
- Title: Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
- Authors: Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik,
- Abstract summary: Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources.
This paper investigates the detrimental impact of retrieved "hard negatives" as a key contributor.
To mitigate this and enhance the robustness of long-context LLM-based RAG, we propose both training-free and training-based approaches.
- Score: 36.754491649652664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information, to potentially enhance the quality of generated outputs. It is plausible to assume that a larger retrieval set would contain more relevant information (higher recall), that might result in improved performance. However, our empirical findings demonstrate that for many long-context LLMs, the quality of generated output initially improves first, but then subsequently declines as the number of retrieved passages increases. This paper investigates this phenomenon, identifying the detrimental impact of retrieved "hard negatives" as a key contributor. To mitigate this and enhance the robustness of long-context LLM-based RAG, we propose both training-free and training-based approaches. We first showcase the effectiveness of retrieval reordering as a simple yet powerful training-free optimization. Furthermore, we explore training-based methods, specifically RAG-specific implicit LLM fine-tuning and RAG-oriented fine-tuning with intermediate reasoning, demonstrating their capacity for substantial performance gains. Finally, we conduct a systematic analysis of design choices for these training-based methods, including data distribution, retriever selection, and training context length.
Related papers
- Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation [20.420575358183687]
Retrieval-Augmented Generation (RAG) has proven to be an effective method for mitigating hallucination issues inherent in large language models (LLMs)
Previous approaches typically train retrievers based on semantic similarity, lacking optimization for RAG.
We propose a novel framework, FiGRet, which leverages the language capabilities of LLMs to construct examples from a more granular, information-centric perspective.
arXiv Detail & Related papers (2024-11-06T14:42:39Z) - Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.
Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.
We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z) - W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge.
To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources.
We propose W-RAG by utilizing the ranking capabilities of LLMs to create weakly labeled data for training dense retrievers.
arXiv Detail & Related papers (2024-08-15T22:34:44Z) - Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting [68.90949377014742]
Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM.
Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts.
It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.
arXiv Detail & Related papers (2024-07-11T06:50:19Z) - Improving Retrieval for RAG based Question Answering Models on Financial Documents [0.046603287532620746]
This paper explores the existing constraints of RAG pipelines and introduces methodologies for enhancing text retrieval.
It delves into strategies such as sophisticated chunking techniques, query expansion, the incorporation of metadata annotations, the application of re-ranking algorithms, and the fine-tuning of embedding algorithms.
arXiv Detail & Related papers (2024-03-23T00:49:40Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models [9.688626139309013]
Retrieval-Augmented Generation is considered as a means to improve the trustworthiness of text generation from large language models.
In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers.
We introduce a novel optimization technique called Gradient Guided Prompt Perturbation.
arXiv Detail & Related papers (2024-02-11T12:25:41Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.