Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
- URL: http://arxiv.org/abs/2502.04666v1
- Date: Fri, 07 Feb 2025 05:19:13 GMT
- Title: Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
- Authors: Rishabh Uapadhyay, Marco Viviani,
- Abstract summary: This paper introduces a solution driven by Retrieval-Augmented Generation (RAG) to enhance the retrieval of health-related documents grounded in scientific evidence.
In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature.
In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText)
In the last stage, the documents to be retrieved are evaluated and ranked both from the point of
- Score: 0.7673339435080445
- License:
- Abstract: The exponential surge in online health information, coupled with its increasing use by non-experts, highlights the pressing need for advanced Health Information Retrieval models that consider not only topical relevance but also the factual accuracy of the retrieved information, given the potential risks associated with health misinformation. To this aim, this paper introduces a solution driven by Retrieval-Augmented Generation (RAG), which leverages the capabilities of generative Large Language Models (LLMs) to enhance the retrieval of health-related documents grounded in scientific evidence. In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature. In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText). In the last stage, the documents to be retrieved are evaluated and ranked both from the point of view of topical relevance and factual accuracy by means of their comparison with GenText, either through stance detection or semantic similarity. In addition to calculating factual accuracy, GenText can offer a layer of explainability for it, aiding users in understanding the reasoning behind the retrieval. Experimental evaluation of our model on benchmark datasets and against baseline models demonstrates its effectiveness in enhancing the retrieval of both topically relevant and factually accurate health information, thus presenting a significant step forward in the health misinformation mitigation problem.
Related papers
- RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering [29.065294682044]
The current paradigm, Retrieval-Augmented Generation (RAG), acquires expertise medical knowledge through large-scale corpus retrieval.
This paper introduces RGAR, a recurrence generation-augmented retrieval framework that retrieves both relevant factual and conceptual knowledge from dual sources.
arXiv Detail & Related papers (2025-02-19T01:50:10Z) - Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications [45.935798913942904]
We propose an innovative framework that combines structured biomedical knowledge with large language models (LLMs)
Our system develops a thorough knowledge graph by identifying and refining causal relationships and named entities from medical abstracts related to age-related macular degeneration (AMD)
Using a vector-based retrieval process and a locally deployed language model, our framework produces responses that are both contextually relevant and verifiable, with direct references to clinical evidence.
arXiv Detail & Related papers (2025-02-16T12:52:28Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue.
SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.
We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation [1.2839205715237014]
Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions.
LLMs generate responses based on patterns learned from diverse internet data.
Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in RAG responses.
arXiv Detail & Related papers (2024-07-25T13:47:01Z) - RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models [35.60385437194243]
Current Medical Large Vision Language Models (Med-LVLMs) frequently encounter factual issues.
RAG, which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges.
We propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the selection of retrieved contexts.
Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model.
arXiv Detail & Related papers (2024-07-06T16:45:07Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.