Related papers: SpiritRAG: A Q&A System for Religion and Spirituality in the United Nations Archive

SpiritRAG: A Q&A System for Religion and Spirituality in the United Nations Archive

URL: http://arxiv.org/abs/2507.04395v1
Date: Sun, 06 Jul 2025 13:54:54 GMT
Title: SpiritRAG: A Q&A System for Religion and Spirituality in the United Nations Archive
Authors: Yingqiang Gao, Fabian Winiger, Patrick Montjourides, Anastassia Shaitarova, Nianlong Gu, Simon Peng-Keller, Gerold Schneider,
Abstract summary: We present SpiritRAG, an interactive Question Answering (Q&A) system based on Retrieval-Augmented Generation (RAG)<n>Built using 7,500 United Nations (UN) resolution documents related to R/S in the domains of health and education, SpiritRAG allows researchers and policymakers to conduct complex, context-sensitive database searches.<n>A pilot test and evaluation with domain experts on 100 manually composed questions demonstrates the practical value and usefulness of SpiritRAG.
Score: 4.575515160275914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Religion and spirituality (R/S) are complex and highly domain-dependent concepts which have long confounded researchers and policymakers. Due to their context-specificity, R/S are difficult to operationalize in conventional archival search strategies, particularly when datasets are very large, poorly accessible, and marked by information noise. As a result, considerable time investments and specialist knowledge is often needed to extract actionable insights related to R/S from general archival sources, increasing reliance on published literature and manual desk reviews. To address this challenge, we present SpiritRAG, an interactive Question Answering (Q&A) system based on Retrieval-Augmented Generation (RAG). Built using 7,500 United Nations (UN) resolution documents related to R/S in the domains of health and education, SpiritRAG allows researchers and policymakers to conduct complex, context-sensitive database searches of very large datasets using an easily accessible, chat-based web interface. SpiritRAG is lightweight to deploy and leverages both UN documents and user provided documents as source material. A pilot test and evaluation with domain experts on 100 manually composed questions demonstrates the practical value and usefulness of SpiritRAG.

Related papers

A Systematic Review of FAIR-compliant Big Data Software Reference Architectures [0.0]
The FAIR Principles emphasize the importance of making scientific data Findable, Accessible, Interoperable, and Reusable.<n>This article conducts a systematic review of research efforts focused on architectural solutions for such repositories.
arXiv Detail & Related papers (2025-09-17T19:10:39Z)
AquiLLM: a RAG Tool for Capturing Tacit Knowledge in Research Groups [0.0]
Research groups face persistent challenges in capturing, storing, and retrieving knowledge that is distributed across team members.<n>AquiLLM is a lightweight, modular RAG system designed to meet the needs of research groups.
arXiv Detail & Related papers (2025-07-25T20:47:01Z)
Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications [0.0]
INRAExplorer is an agentic RAG system for exploring the scientific data of INRAE (France's National Research Institute for Agriculture, Food and Environment)
arXiv Detail & Related papers (2025-07-22T12:03:10Z)
Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z)
RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs [12.846097618151951]
We develop a dataset for LLMs Complex Reasoning over Textual Knowledge Graphs (RiTeK) with a broad topological structure coverage. We synthesize realistic user queries that integrate diverse topological structures, annotated information, and complex textual descriptions. We introduce an enhanced Monte Carlo Tree Search (CTS) method, which automatically extracts relational path information from textual graphs for specific queries.
arXiv Detail & Related papers (2024-10-17T19:33:37Z)
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study [46.55831783809377]
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs)<n>We develop PruningRAG, a plug-and-play RAG framework that uses multi-granularity pruning strategies to more effectively incorporate relevant context and mitigate the negative impact of misleading information.
arXiv Detail & Related papers (2024-09-03T03:31:37Z)
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation [35.981443744108255]
We propose a novel RAG framework, namely RichRAG. It includes a sub-aspect explorer to identify potential sub-aspects of input questions, a retriever to build a candidate pool of diverse external documents related to these sub-aspects, and a generative list-wise ranker. Experimental results on two publicly available datasets prove that our framework effectively and efficiently provides comprehensive and satisfying responses to users.
arXiv Detail & Related papers (2024-06-18T12:52:51Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering [115.72130322143275]
REAR is a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA) We develop a novel architecture for LLM-based RAG systems, by incorporating a specially designed assessment module. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches.
arXiv Detail & Related papers (2024-02-27T13:22:51Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
DORIS-MAE: Scientific Document Retrieval using Multi-level Aspect-based Queries [2.4816250611120547]
We propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE) For each complex query, we assembled a collection of 100 relevant documents and produced annotated relevance scores for ranking them. Anno-GPT is a framework for validating the performance of Large Language Models (LLMs) on expert-level dataset annotation tasks.
arXiv Detail & Related papers (2023-10-07T03:25:06Z)
Building Interpretable and Reliable Open Information Retriever for New Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA) We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query. We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z)
Query-Specific Knowledge Graphs for Complex Finance Topics [6.599344783327053]
We focus on the CODEC dataset, where domain experts create challenging questions. We show that state-of-the-art ranking systems have headroom for improvement. We demonstrate that entity and document relevance are positively correlated.
arXiv Detail & Related papers (2022-11-08T10:21:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.