Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
- URL: http://arxiv.org/abs/2502.04666v1
- Date: Fri, 07 Feb 2025 05:19:13 GMT
- Title: Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
- Authors: Rishabh Uapadhyay, Marco Viviani,
- Abstract summary: This paper introduces a solution driven by Retrieval-Augmented Generation (RAG) to enhance the retrieval of health-related documents grounded in scientific evidence.<n>In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature.<n>In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText)<n>In the last stage, the documents to be retrieved are evaluated and ranked both from the point of
- Score: 0.7673339435080445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The exponential surge in online health information, coupled with its increasing use by non-experts, highlights the pressing need for advanced Health Information Retrieval models that consider not only topical relevance but also the factual accuracy of the retrieved information, given the potential risks associated with health misinformation. To this aim, this paper introduces a solution driven by Retrieval-Augmented Generation (RAG), which leverages the capabilities of generative Large Language Models (LLMs) to enhance the retrieval of health-related documents grounded in scientific evidence. In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature. In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText). In the last stage, the documents to be retrieved are evaluated and ranked both from the point of view of topical relevance and factual accuracy by means of their comparison with GenText, either through stance detection or semantic similarity. In addition to calculating factual accuracy, GenText can offer a layer of explainability for it, aiding users in understanding the reasoning behind the retrieval. Experimental evaluation of our model on benchmark datasets and against baseline models demonstrates its effectiveness in enhancing the retrieval of both topically relevant and factually accurate health information, thus presenting a significant step forward in the health misinformation mitigation problem.
Related papers
- Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents [64.43980129731587]
We propose a causal-inspired inference-time debiasing method called Causal Diagnosis and Correction (CDC)
CDC first diagnoses the bias effect of the perplexity and then separates the bias effect from the overall relevance score.
Experimental results across three domains demonstrate the superior debiasing effectiveness.
arXiv Detail & Related papers (2025-03-11T17:59:00Z) - RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering [29.065294682044]
The current paradigm, Retrieval-Augmented Generation (RAG), acquires expertise medical knowledge through large-scale corpus retrieval.
This paper introduces RGAR, a recurrence generation-augmented retrieval framework that retrieves both relevant factual and conceptual knowledge from dual sources.
arXiv Detail & Related papers (2025-02-19T01:50:10Z) - Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications [45.935798913942904]
We propose an innovative framework that combines structured biomedical knowledge with large language models (LLMs)
Our system develops a thorough knowledge graph by identifying and refining causal relationships and named entities from medical abstracts related to age-related macular degeneration (AMD)
Using a vector-based retrieval process and a locally deployed language model, our framework produces responses that are both contextually relevant and verifiable, with direct references to clinical evidence.
arXiv Detail & Related papers (2025-02-16T12:52:28Z) - Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance.
We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue.
SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.
We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation [1.2839205715237014]
Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions.
LLMs generate responses based on patterns learned from diverse internet data.
Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in RAG responses.
arXiv Detail & Related papers (2024-07-25T13:47:01Z) - RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models [35.60385437194243]
Current Medical Large Vision Language Models (Med-LVLMs) frequently encounter factual issues.
RAG, which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges.
We propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the selection of retrieved contexts.
Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model.
arXiv Detail & Related papers (2024-07-06T16:45:07Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises [59.4999994297993]
This comprehensive review aims to provide an overview of the current state of Healthcare Knowledge Graphs (HKGs)<n>We thoroughly analyzed existing literature on HKGs, covering their construction methodologies, utilization techniques, and applications.<n>The review highlights the potential of HKGs to significantly impact biomedical research and clinical practice.
arXiv Detail & Related papers (2023-06-07T21:51:56Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.