Retrieval-Augmented Generation with Estimation of Source Reliability
- URL: http://arxiv.org/abs/2410.22954v1
- Date: Wed, 30 Oct 2024 12:09:29 GMT
- Title: Retrieval-Augmented Generation with Estimation of Source Reliability
- Authors: Jeongyeon Hwang, Junyoung Park, Hyejin Park, Sangdon Park, Jungseul Ok,
- Abstract summary: Reliability-Aware RAG (RA-RAG) estimates the reliability of multiple sources and incorporates this information into both retrieval and aggregation processes.
We introduce a benchmark designed to reflect real-world scenarios with heterogeneous source reliability.
- Score: 15.69681944254975
- License:
- Abstract: Retrieval-augmented generation (RAG) addresses key limitations of large language models (LLMs), such as hallucinations and outdated knowledge, by incorporating external databases. These databases typically consult multiple sources to encompass up-to-date and various information. However, standard RAG methods often overlook the heterogeneous source reliability in the multi-source database and retrieve documents solely based on relevance, making them prone to propagating misinformation. To address this, we propose Reliability-Aware RAG (RA-RAG) which estimates the reliability of multiple sources and incorporates this information into both retrieval and aggregation processes. Specifically, it iteratively estimates source reliability and true answers for a set of queries with no labelling. Then, it selectively retrieves relevant documents from a few of reliable sources and aggregates them using weighted majority voting, where the selective retrieval ensures scalability while not compromising the performance. We also introduce a benchmark designed to reflect real-world scenarios with heterogeneous source reliability and demonstrate the effectiveness of RA-RAG compared to a set of baselines.
Related papers
- Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval [4.862780562808097]
Existing Adaptive RAG (ARAG) systems struggle to effectively explore multiple retrieval sources due to their inability to select the right source at the right time.
We propose a multi-source ARAG framework, termed MSPR, which synergizes reasoning and preference-driven retrieval to adaptive decide "when and what to retrieve" and "which retrieval source to use"
arXiv Detail & Related papers (2024-11-01T15:50:58Z) - Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval [3.9639424852746274]
We propose a Probing-RAG, which utilizes the hidden state representations from the intermediate layers of language models to adaptively determine the necessity of additional retrievals for a given query.
Probing-RAG effectively captures the model's internal cognition, enabling reliable decision-making about retrieving external documents.
arXiv Detail & Related papers (2024-10-17T08:48:54Z) - MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources [12.783393023641505]
We introduce an efficient memory-augmented transformer called MATTER.
MATTER retrieves and reads from both unstructured sources (paragraphs) and semi-structured sources (QA pairs) in the form of fixed-length neural memories.
We demonstrate that our model outperforms existing efficient retrieval-augmented models on popular QA benchmarks in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-06-07T06:35:37Z) - DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW)
DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster.
After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z) - REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain
Question Answering [122.62012375722124]
In existing methods, large language models (LLMs) cannot precisely assess the relevance of retrieved documents.
We propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA)
arXiv Detail & Related papers (2024-02-27T13:22:51Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Towards Realistic Low-resource Relation Extraction: A Benchmark with
Empirical Baseline Study [51.33182775762785]
This paper presents an empirical study to build relation extraction systems in low-resource settings.
We investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; and (iii) data augmentation technologies and self-training to generate more labeled in-domain data.
arXiv Detail & Related papers (2022-10-19T15:46:37Z) - k-Rater Reliability: The Correct Unit of Reliability for Aggregated
Human Annotations [2.538209532048867]
A proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets.
We present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353.
arXiv Detail & Related papers (2022-03-24T08:05:06Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Unsupervised Multi-source Domain Adaptation Without Access to Source
Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain.
We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.