Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation
- URL: http://arxiv.org/abs/2407.10021v1
- Date: Sat, 13 Jul 2024 22:45:46 GMT
- Title: Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation
- Authors: Kriti Bhattarai, Inez Y. Oh, Zachary B. Abrams, Albert M. Lai,
- Abstract summary: We leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts.
Our framework selects UMLS concepts relevant to the text and combines them with prompts to guide language models in extracting entities.
- Score: 0.869967783513041
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative pre-trained transformer (GPT) models have shown promise in clinical entity and relation extraction tasks because of their precise extraction and contextual understanding capability. In this work, we further leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts and improve clinical entity and relation extraction at the document level. Our framework selects UMLS concepts relevant to the text and combines them with prompts to guide language models in extracting entities. Our experiments demonstrate that this initial concept mapping and the inclusion of these mapped concepts in the prompts improves extraction results compared to few-shot extraction tasks on generic language models that do not leverage UMLS. Further, our results show that this approach is more effective than the standard Retrieval Augmented Generation (RAG) technique, where retrieved data is compared with prompt embeddings to generate results. Overall, we find that integrating UMLS concepts with GPT models significantly improves entity and relation identification, outperforming the baseline and RAG models. By combining the precise concept mapping capability of knowledge-based approaches like UMLS with the contextual understanding capability of GPT, our method highlights the potential of these approaches in specialized domains like healthcare.
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies [3.2221734920470797]
We propose a Vision-Language framework augmented with a Knowledge Graph (KG)-based datastore to generate Natural Language Explanations (NLEs) for medical images.
Our framework employs a KG-based retrieval mechanism that not only improves the precision of the generated explanations but also preserves data privacy by avoiding direct data retrieval.
These frameworks are validated on the MIMIC-NLE dataset, where they achieve state-of-the-art results.
arXiv Detail & Related papers (2024-10-07T04:59:08Z) - Towards Ontology-Enhanced Representation Learning for Large Language Models [0.18416014644193066]
We propose a novel approach to improve an embedding-Large Language Model (embedding-LLM) of interest by infusing knowledge by a reference ontology.
The linguistic information (i.e. concept synonyms and descriptions) and structural information (i.e. is-a relations) are utilized to compile a comprehensive set of concept definitions.
These concept definitions are then employed to fine-tune the target embedding-LLM using a contrastive learning framework.
arXiv Detail & Related papers (2024-05-30T23:01:10Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for
Biomedical Entity Recognition [4.865221751784403]
This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS.
Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.
arXiv Detail & Related papers (2023-07-20T18:08:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Unifying Heterogenous Electronic Health Records Systems via Text-Based
Code Embedding [7.3394352452936085]
We introduce DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR.
We tested our model's capacity on various experiments including prediction tasks, transfer learning and pooled learning.
arXiv Detail & Related papers (2021-08-08T12:47:42Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.