Related papers: Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

URL: http://arxiv.org/abs/2506.20869v1
Date: Wed, 25 Jun 2025 22:40:00 GMT
Title: Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation
Authors: Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell, Ayman Asad Khan, Mika Saari, Pekka Abrahamsson,
Abstract summary: Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge.<n>This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics.
Score: 4.448709087838503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.

Related papers

Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets [0.0]
Retrieval-Augmented Generation (RAG) has advanced significantly in recent years.<n>RAG complexity poses substantial challenges for systematic evaluation and quality enhancement.<n>This study systematically reviews 63 academic articles to provide a comprehensive overview of state-of-the-art RAG evaluation methodologies.
arXiv Detail & Related papers (2025-04-28T08:22:19Z)
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey [29.186229489968564]
Retrieval-Augmented Generation (RAG) has revolutionized natural language processing by integrating Large Language Models (LLMs) with external information retrieval.<n> evaluating RAG systems presents unique challenges due to their hybrid architecture that combines retrieval and generation components.
arXiv Detail & Related papers (2025-04-21T06:39:47Z)
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain [62.89809156574998]
We introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain.<n>Our benchmark is characterized by its multi-dimensional evaluation framework.<n>Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets.
arXiv Detail & Related papers (2024-12-17T15:38:42Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
A Methodology for Evaluating RAG Systems: A Case Study On Configuration Dependency Validation [6.544757635738911]
Retrieval-augmented generation (RAG) is an umbrella of different components, design decisions, and domain-specific adaptations. There is currently no generally accepted methodology for RAG evaluation despite a growing interest in this technology. We propose a first blueprint of a methodology for a sound and reliable evaluation of RAG systems.
arXiv Detail & Related papers (2024-10-11T13:36:13Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs) We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z)
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.