DIRAS: Efficient LLM Annotation of Document Relevance in Retrieval Augmented Generation
- URL: http://arxiv.org/abs/2406.14162v4
- Date: Thu, 23 Jan 2025 08:41:05 GMT
- Title: DIRAS: Efficient LLM Annotation of Document Relevance in Retrieval Augmented Generation
- Authors: Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold,
- Abstract summary: Domain-specific queries usually need nuanced definitions of relevance beyond shallow semantic relevance.<n>Human or GPT-4 annotation is costly and cannot cover all (query, document) pairs.<n>We propose DIRAS (Domain-specific Information Retrieval with Scalability), a manual-annotation-free schema.
- Score: 37.823892101215684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information when answering queries that need an integrated analysis of information (e.g., Tell me good news in the stock market today.)? To address these concerns, RAG developers need to annotate information retrieval (IR) data for their domain of interest, which is challenging because (1) domain-specific queries usually need nuanced definitions of relevance beyond shallow semantic relevance; and (2) human or GPT-4 annotation is costly and cannot cover all (query, document) pairs (i.e., annotation selection bias), thus harming the effectiveness in evaluating IR recall. To address these challenges, we propose DIRAS (Domain-specific Information Retrieval Annotation with Scalability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to consider nuanced relevance definition and annotate (partial) relevance labels with calibrated relevance scores. Extensive evaluation shows that DIRAS enables smaller (8B) LLMs to achieve GPT-4-level performance on annotating and ranking unseen (query, document) pairs, and is helpful for real-world RAG development. All code, LLM generations, and human annotations can be found in \url{https://github.com/EdisonNi-hku/DIRAS}.
Related papers
- Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG [69.51637252264277]
We investigate whether Large Language Models (LLMs) can effectively replace human annotations in training retrieval models.
Our experiments show that retrievers trained on utility-focused annotations significantly outperform those trained on human annotations in the out-of-domain setting.
Just 20% human-annotated data enables retrievers trained with utility-focused annotations to match the performance of models trained entirely with human annotations.
arXiv Detail & Related papers (2025-04-07T16:05:52Z) - Insight-RAG: Enhancing LLMs with Insight-Driven Augmentation [4.390998479503661]
We propose Insight-RAG, a novel framework designed to retrieve documents based on insights.
In the initial stage of Insight-RAG, instead of using traditional retrieval methods, we employ an LLM to analyze the input query and task.
By integrating the original query with the retrieved insights, similar to conventional RAG approaches, we employ a final LLM to generate a contextually enriched and accurate response.
arXiv Detail & Related papers (2025-03-31T19:50:27Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text [0.848663031844483]
This paper identifies the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available.
We show that the critiquing Agent is able to rectify scores from LLM evaluators, thereby reducing the need for labeled data even for complex NLG evaluation scenarios.
arXiv Detail & Related papers (2024-11-25T04:07:16Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback [17.986392250269606]
We introduce Real Document Embeddings from Relevance Feedback (ReDE-RF)
Inspired by relevance feedback, ReDE-RF proposes to re-frame hypothetical document generation as a relevance estimation task.
Our experiments show that ReDE-RF consistently surpasses state-of-the-art zero-shot dense retrieval methods.
arXiv Detail & Related papers (2024-10-28T17:40:40Z) - Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks.
Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes.
In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge.
To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources.
We propose W-RAG by utilizing the ranking capabilities of LLMs to create weakly labeled data for training dense retrievers.
arXiv Detail & Related papers (2024-08-15T22:34:44Z) - Evaluation of RAG Metrics for Question Answering in the Telecom Domain [0.650923326742559]
Retrieval Augmented Generation (RAG) is widely used to enable Large Language Models (LLMs) perform Question Answering (QA) tasks.
This work is a modified version of this package for few metrics (faithfulness, context relevance, answer relevance, answer correctness, answer similarity and factual correctness) through which we provide the intermediate outputs of the prompts.
Next, we analyse the expert evaluations of the output of the modified RAGAS package and observe the challenges of using it in the telecom domain.
arXiv Detail & Related papers (2024-07-15T17:40:15Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - Large Language Models for Data Annotation: A Survey [49.8318827245266]
The emergence of advanced Large Language Models (LLMs) presents an unprecedented opportunity to automate the complicated process of data annotation.
This survey includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.