Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
- URL: http://arxiv.org/abs/2409.13694v3
- Date: Sun, 16 Feb 2025 11:07:13 GMT
- Title: Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
- Authors: Shuo Yu, Mingyue Cheng, Jiqian Yang, Jie Ouyang, Yucong Luo, Chenyi Lei, Qi Liu, Enhong Chen,
- Abstract summary: Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs)
We develop PruningRAG, a plug-and-play RAG framework that uses multi-granularity pruning strategies to more effectively incorporate relevant context and mitigate the negative impact of misleading information.
- Score: 46.55831783809377
- License:
- Abstract: Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of external knowledge source. In contrast, most real-world applications involve diverse knowledge from various sources, a scenario that has been relatively underexplored. The main dilemma is the lack of a suitable dataset incorporating multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Building upon the dataset, we identify the limitations of existing methods under such conditions. Therefore, we develop PruningRAG, a plug-and-play RAG framework that uses multi-granularity pruning strategies to more effectively incorporate relevant context and mitigate the negative impact of misleading information. Extensive experimental results demonstrate superior performance of PruningRAG and our insightful findings are also reported. Our dataset and code are publicly available\footnote{https://github.com/USTCAGI/PruningRAG}.
Related papers
- Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation [2.549112678136113]
Retrieval-Augmented Generation (RAG) mitigates issues by integrating external dynamic information enhancing factual and updated grounding.
Cross-modal alignment and reasoning introduce unique challenges to Multimodal RAG, distinguishing it from traditional unimodal RAG.
This survey lays the foundation for developing more capable and reliable AI systems.
arXiv Detail & Related papers (2025-02-12T22:33:41Z) - CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity [23.48167670445722]
Retrieval-Augmented Generation (RAG) aims to generate more accurate and reliable answers with the help of the retrieved context from external knowledge sources.
evaluating these systems remains a crucial research area due to the following issues.
We propose a Comprehensive Full-chain Evaluation (CoFE-RAG) framework to facilitate thorough evaluation across the entire RAG pipeline.
arXiv Detail & Related papers (2024-10-16T05:20:32Z) - Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey [0.0]
Large Language Models (LLMs) showcase remarkable abilities, yet they struggle with limitations such as hallucinations, outdated knowledge, opacity, and inexplicable reasoning.
Retrieval-Augmented Generation (RAG) has proven to be a viable solution, leveraging external databases to improve the consistency and coherence of generated content.
arXiv Detail & Related papers (2024-09-20T10:36:49Z) - DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation [19.907074685082]
Retrieval-Augmented Generation offers a promising solution to address various limitations of Large Language Models.
Current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems.
We identified six required abilities for RAG models, including the ability in conversational RAG.
arXiv Detail & Related papers (2024-06-09T05:33:51Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering [115.72130322143275]
REAR is a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA)
We develop a novel architecture for LLM-based RAG systems, by incorporating a specially designed assessment module.
Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches.
arXiv Detail & Related papers (2024-02-27T13:22:51Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.