Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation
- URL: http://arxiv.org/abs/2409.15699v1
- Date: Tue, 24 Sep 2024 03:25:36 GMT
- Title: Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation
- Authors: Zheng Liu, Chenyuan Wu, Ninglu Shao, Shitao Xiao, Chaozhuo Li, Defu Lian,
- Abstract summary: We introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG)
In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the Large Language Models (LLMs)
A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts.
- Score: 32.26885597587913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. On the other hand, directly using generic Large Language Models (LLMs) often leads to sub-optimal answers, while task-specific fine-tuning may compromise the LLMs' general capabilities. To address these challenges, we introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG). In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the LLMs. Simultaneously, these compressed embeddings are optimized to enhance downstream RAG performance. A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts. Thanks to these technical designs, FlexRAG achieves superior generation quality while significantly reducing running costs. Comprehensive experiments on various question-answering datasets validate our approach as a cost-effective and flexible solution for RAG systems.
Related papers
- InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems [76.39776789410088]
This work introduces a framework that combines the strong performance of supervised approaches and the flexibility of zero-shot methods.
A novel architectural design seamlessly integrates the degradation operator directly into the denoiser.
Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance.
arXiv Detail & Related papers (2025-04-02T12:40:57Z) - Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding [0.0]
We present a framework for enhancing Retrieval-Augmented Generation (RAG) systems through dynamic retrieval strategies and reinforcement fine-tuning.
Our framework integrates two complementary techniques: Policy-d RetrievalAugmented Generation (PORAG) and Adaptive Token-Layer Attention Scoring (ATLAS)
Our framework reduces hallucinations, strengthens domain-specific reasoning, and achieves significant efficiency and scalability gains over traditional RAG systems.
arXiv Detail & Related papers (2025-04-02T01:16:10Z) - Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model hallucinations.
Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.
We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z) - RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization [53.63439735067081]
Large language models (LLMs) have achieved impressive performance but face high computational costs and latency.
Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs.
We propose RoseRAG, a robust RAG framework for SLMs via Margin-aware Preference Optimization.
arXiv Detail & Related papers (2025-02-16T04:56:53Z) - FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs [17.477161619378332]
We propose a novel flexible modular KG-RAG framework, termed FRAG, which synergizes the advantages of both approaches.
By using the query text instead of the Knowledge Graph, FRAG improves retrieval quality while maintaining flexibility.
arXiv Detail & Related papers (2025-01-17T05:19:14Z) - A Survey of Query Optimization in Large Language Models [10.255235456427037]
RAG mitigates the limitations of Large Language Models by dynamically retrieving and leveraging up-to-date relevant information.
QO has emerged as a critical element, playing a pivotal role in determining the effectiveness of RAG's retrieval stage.
arXiv Detail & Related papers (2024-12-23T13:26:04Z) - Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs [23.357843519762483]
Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation framework, combined with Knowledge Graphs, robustly enhances the reasoning capabilities of Large language models.
We introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities.
Our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments.
arXiv Detail & Related papers (2024-12-10T15:56:03Z) - mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge.
We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization.
mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z) - LightRAG: Simple and Fast Retrieval-Augmented Generation [12.86888202297654]
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources.
Existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness.
We propose LightRAG, which incorporates graph structures into text indexing and retrieval processes.
arXiv Detail & Related papers (2024-10-08T08:00:12Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Large Language Model Empowered Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the potential to understand the semantic connections between items, regardless of their popularity.
We present LLMEmb, an innovative technique that harnesses LLM to create item embeddings that bolster the performance of Sequential Recommender Systems.
arXiv Detail & Related papers (2024-09-30T03:59:06Z) - Efficient In-Domain Question Answering for Resource-Constrained Environments [0.07499722271664146]
Retrieval Augmented Generation (RAG) is a method for integrating external knowledge into pretrained Large Language Models (LLMs)
Recent studies have shown success in using fine tuning to address these problems.
In this work, we combine RAFT with LoRA to reduce fine tuning and storage requirements and gain faster inference times.
arXiv Detail & Related papers (2024-09-26T08:55:21Z) - SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval [40.17823569905232]
Retrieval-Augmented Generation (RAG) has greatly improved large language models (LLMs) by enabling them to generate accurate, contextually grounded responses.
RAG approaches, which prioritize top-ranked documents based solely on query-context relevance, often introduce redundancy and conflicting information.
We propose Selection using Matrices for Augmented Retrieval (RAG) in question answering tasks, a fully unsupervised and training-free framework designed to optimize context selection in RAG.
arXiv Detail & Related papers (2024-09-21T03:03:09Z) - Flexora: Flexible Low Rank Adaptation for Large Language Models [12.696136981847438]
Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters.
Their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks.
We propose the flexible low rank adaptation (Flexora) method to automatically and flexibly select the most important layers.
arXiv Detail & Related papers (2024-08-20T12:13:04Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Improving Retrieval for RAG based Question Answering Models on Financial Documents [0.046603287532620746]
This paper explores the existing constraints of RAG pipelines and introduces methodologies for enhancing text retrieval.
It delves into strategies such as sophisticated chunking techniques, query expansion, the incorporation of metadata annotations, the application of re-ranking algorithms, and the fine-tuning of embedding algorithms.
arXiv Detail & Related papers (2024-03-23T00:49:40Z) - RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems [51.171355532527365]
Retrieval-augmented generation (RAG) can significantly improve the performance of language models (LMs)
RAGGED is a framework for analyzing RAG configurations across various document-based question answering tasks.
arXiv Detail & Related papers (2024-03-14T02:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.