OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models
- URL: http://arxiv.org/abs/2502.18992v1
- Date: Wed, 26 Feb 2025 09:56:10 GMT
- Title: OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models
- Authors: Hui Feng, Yuntzu Yin, Emiliano Reynares, Jay Nanavati,
- Abstract summary: We create OntologyRAG, a retrieval-augmented generation (RAG) method for in-context-learning representations.<n>Our solution grounds LLMs to knowledge graphs with mappings between and processes questions.<n>Our solution doesn't require re-training LMs, as all ontology updates could be reflected by updating the knowledge graphs with a standard process.
- Score: 1.2941187430993801
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Biomedical ontologies, which comprehensively define concepts and relations for biomedical entities, are crucial for structuring and formalizing domain-specific information representations. Biomedical code mapping identifies similarity or equivalence between concepts from different ontologies. Obtaining high-quality mapping usually relies on automatic generation of unrefined mapping with ontology domain fine-tuned language models (LMs), followed by manual selections or corrections by coding experts who have extensive domain expertise and familiarity with ontology schemas. The LMs usually provide unrefined code mapping suggestions as a list of candidates without reasoning or supporting evidence, hence coding experts still need to verify each suggested candidate against ontology sources to pick the best matches. This is also a recurring task as ontology sources are updated regularly to incorporate new research findings. Consequently, the need of regular LM retraining and manual refinement make code mapping time-consuming and labour intensive. In this work, we created OntologyRAG, an ontology-enhanced retrieval-augmented generation (RAG) method that leverages the inductive biases from ontological knowledge graphs for in-context-learning (ICL) in large language models (LLMs). Our solution grounds LLMs to knowledge graphs with unrefined mappings between ontologies and processes questions by generating an interpretable set of results that include prediction rational with mapping proximity assessment. Our solution doesn't require re-training LMs, as all ontology updates could be reflected by updating the knowledge graphs with a standard process. Evaluation results on a self-curated gold dataset show promises of using our method to enable coding experts to achieve better and faster code mapping. The code is available at https://github.com/iqvianlp/ontologyRAG.
Related papers
- A Retrieval-Augmented Knowledge Mining Method with Deep Thinking LLMs for Biomedical Research and Clinical Support [4.663954774358363]
We introduce Integrated and Progressive Retrieval-Augmented Reasoning (IP-RAR) to enhance retrieval accuracy and knowledge reasoning.
IP-RAR maximizes information recall through Integrated Reasoning-based Retrieval and refines knowledge via Progressive Reasoning-based Generation.
This framework helps doctors efficiently integrate treatment evidence for personalized medication plans.
arXiv Detail & Related papers (2025-03-29T09:56:42Z) - Pathological Prior-Guided Multiple Instance Learning For Mitigating Catastrophic Forgetting in Breast Cancer Whole Slide Image Classification [50.899861205016265]
We propose a new framework PaGMIL to mitigate catastrophic forgetting in breast cancer WSI classification.
Our framework introduces two key components into the common MIL model architecture.
We evaluate the continual learning performance of PaGMIL across several public breast cancer datasets.
arXiv Detail & Related papers (2025-03-08T04:51:58Z) - Context-Augmented Code Generation Using Programming Knowledge Graphs [0.0]
Large Language Models (LLMs) and Code-LLMs (CLLMs) frequently face difficulties when dealing with challenging and complex problems.
We present a novel framework that leverages a Programming Knowledge Graph (PKG) to semantically represent and retrieve code.
arXiv Detail & Related papers (2024-10-09T16:35:41Z) - Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models [18.984165679347026]
Self-BioRAG is a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses.
We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens.
arXiv Detail & Related papers (2024-01-27T02:29:42Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - MapperGPT: Large Language Models for Linking and Mapping Entities [1.5340902251924438]
We present MapperGPT, an approach that uses Large Language Models to review and refine mapping as a post-processing step.
We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy.
arXiv Detail & Related papers (2023-10-05T16:43:04Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented
Prompting [33.1455954220194]
HiPrompt is a supervision-efficient knowledge fusion framework.
It elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts.
Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.
arXiv Detail & Related papers (2023-04-12T16:54:26Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.