Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU
- URL: http://arxiv.org/abs/2411.13691v1
- Date: Wed, 20 Nov 2024 20:10:43 GMT
- Title: Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU
- Authors: Haojia Sun, Yaqi Wang, Shuting Zhang,
- Abstract summary: We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions.
We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs.
Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy.
- Score: 3.1787418271023404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions about Pittsburgh and Carnegie Mellon University (CMU). We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs, achieving an inter-annotator agreement (IAA) score of 0.7625. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy. Experimental results show that the RAG system significantly outperforms a non-RAG baseline, particularly in time-sensitive and complex queries, with an F1 score improvement from 5.45% to 42.21% and recall of 56.18%. This study demonstrates the potential of RAG systems in enhancing answer precision and relevance, while identifying areas for further optimization in document retrieval and model training.
Related papers
- Evaluating Hybrid Retrieval Augmented Generation using Dynamic Test Sets: LiveRAG Challenge [8.680958290253914]
We present our submission to the LiveRAG Challenge 2025, which evaluates retrieval-augmented generation (RAG) systems on dynamic test sets.<n>Our final hybrid approach combines sparse (BM25) and dense (E5) retrieval methods.<n>We demonstrate that neural re-ranking with RankLLaMA improves MAP from 0.523 to 0.797 but introduces prohibitive computational costs.
arXiv Detail & Related papers (2025-06-27T21:20:43Z) - RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering [8.846547396283832]
RAGentA is a multi-agent retrieval-augmented generation (RAG) framework for attributed question answering (QA)<n>Central to the framework is a hybrid retrieval strategy that combines sparse and dense methods, improving Recall@20 by 12.5%.<n>RAGentA outperforms standard RAG baselines, achieving gains of 1.09% in correctness and 10.72% in faithfulness.
arXiv Detail & Related papers (2025-06-20T13:37:03Z) - ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge [53.18163869901266]
ESGenius is a benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG)<n> ESGenius comprises two key components: ESGenius-QA and ESGenius-Corpus.
arXiv Detail & Related papers (2025-06-02T13:19:09Z) - From Retrieval to Generation: Comparing Different Approaches [15.31883349259767]
We evaluate retrieval-based, generation-based, and hybrid models for knowledge-intensive tasks.
We show that dense retrievers, particularly DPR, achieve strong performance in ODQA with a top-1 accuracy of 50.17% on NQ.
We also analyze language modeling tasks using WikiText-103, showing that retrieval-based approaches like BM25 achieve lower perplexity compared to generative and hybrid methods.
arXiv Detail & Related papers (2025-02-27T16:29:14Z) - Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.
Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation [34.66546005629471]
Large Language Models (LLMs) are essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information.
Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses.
To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG)
MAIN-RAG is a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents.
arXiv Detail & Related papers (2024-12-31T08:07:26Z) - Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.
We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z) - Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Enhanced Electronic Health Records Text Summarization Using Large Language Models [0.0]
This project builds on prior work by creating a system that generates clinician-preferred, focused summaries.
The proposed system leverages the Flan-T5 model to generate tailored EHR summaries based on clinician-specified topics.
arXiv Detail & Related papers (2024-10-12T19:36:41Z) - RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules.
RAGChecker has significantly better correlations with human judgments than other evaluation metrics.
The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering [61.19126689470398]
Long-form RobustQA (LFRQA) is a new dataset covering 26K queries and large corpora across seven different domains.
We show via experiments that RAG-QA Arena and human judgments on answer quality are highly correlated.
Only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.
arXiv Detail & Related papers (2024-07-19T03:02:51Z) - Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework [0.5897092980823265]
We propose a comprehensive framework to evaluate Retrieval-Augmented Generation (RAG) Question-Answering systems.
We use Large Language Models (LLMs) to generate large datasets of synthetic queries based on real user queries and in-domain documents.
We find that RAGElo positively aligns with the preferences of human annotators, though due caution is still required.
arXiv Detail & Related papers (2024-06-20T23:20:34Z) - CRAG -- Comprehensive RAG Benchmark [58.15980697921195]
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge.
Existing RAG datasets do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks.
To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG)
CRAG is a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search.
arXiv Detail & Related papers (2024-06-07T08:43:07Z) - ARAGOG: Advanced RAG Output Grading [44.99833362998488]
Retrieval-Augmented Generation (RAG) is essential for integrating external knowledge into Large Language Model (LLM) outputs.
This study assesses various RAG methods' impacts on retrieval precision and answer similarity.
arXiv Detail & Related papers (2024-04-01T10:43:52Z) - Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers [0.0]
Retrieval-Augmented Generation (RAG) is a prevalent approach to infuse a private knowledge base of documents with Large Language Models (LLM) to build Generative Q&A (Question-Answering) systems.
We propose the 'Blended RAG' method of leveraging semantic search techniques, such as Vector indexes and Sparse indexes, blended with hybrid query strategies.
Our study achieves better retrieval results and sets new benchmarks for IR (Information Retrieval) datasets like NQ and TREC-COVID datasets.
arXiv Detail & Related papers (2024-03-22T17:13:46Z) - The Chronicles of RAG: The Retriever, the Chunk and the Generator [0.0]
This paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language.
We explore a diverse set of methods to answer questions about the first Harry Potter book.
arXiv Detail & Related papers (2024-01-15T18:25:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.