Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search
- URL: http://arxiv.org/abs/2412.15256v1
- Date: Mon, 16 Dec 2024 02:57:00 GMT
- Title: Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search
- Authors: Edward Kim, Manil Shrestha, Richard Foty, Tom DeLay, Vicki Seyfert-Margolis,
- Abstract summary: We propose creating patient knowledge graphs using large model extraction techniques.
Our method maps to existing hierarchies (SHMe, SNOMED-CT RxNORM, HPO) to ground extracted entities.
We describe our construction of patient-specific knowledge graphs and symptom-based patient searches.
- Score: 1.3453658538563749
- License:
- Abstract: Creation and curation of knowledge graphs can accelerate disease discovery and analysis in real-world data. While disease ontologies aid in biological data annotation, codified categories (SNOMED-CT, ICD10, CPT) may not capture patient condition nuances or rare diseases. Multiple disease definitions across data sources complicate ontology mapping and disease clustering. We propose creating patient knowledge graphs using large language model extraction techniques, allowing data extraction via natural language rather than rigid ontological hierarchies. Our method maps to existing ontologies (MeSH, SNOMED-CT, RxNORM, HPO) to ground extracted entities. Using a large ambulatory care EHR database with 33.6M patients, we demonstrate our method through the patient search for Dravet syndrome, which received ICD10 recognition in October 2020. We describe our construction of patient-specific knowledge graphs and symptom-based patient searches. Using confirmed Dravet syndrome ICD10 codes as ground truth, we employ LLM-based entity extraction to characterize patients in grounded ontologies. We then apply this method to identify Beta-propeller protein-associated neurodegeneration (BPAN) patients, demonstrating real-world discovery where no ground truth exists.
Related papers
- Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion [4.410798232767917]
We propose an efficient method for inpainting pathological features onto healthy anatomy in MRI.
We evaluate the method's ability to insert disc herniation and central canal stenosis in lumbar spine sagittal T2 MRI.
arXiv Detail & Related papers (2024-06-04T16:47:47Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Data and Knowledge Co-driving for Cancer Subtype Classification on
Multi-Scale Histopathological Slides [4.22412600279685]
We propose a Data and Knowledge Co-driving (D&K) model to replicate the process of cancer subtype classification on a histological slide like a pathologist.
Specifically, in the data-driven module, the bagging mechanism in ensemble learning is leveraged to integrate the histological features from various bags extracted by the embedding representation unit.
arXiv Detail & Related papers (2023-04-18T21:57:37Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Multi-confound regression adversarial network for deep learning-based
diagnosis on highly heterogenous clinical data [1.2891210250935143]
We developed a novel deep learning architecture, MUCRAN, to train a deep learning model on highly heterogeneous clinical data.
We trained MUCRAN using 16,821 clinical T1 Axial brain MRIs collected from Massachusetts General Hospital before 2019.
The model showed a robust performance of over 90% accuracy on newly collected data.
arXiv Detail & Related papers (2022-05-05T18:39:09Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Interpretation of Disease Evidence for Medical Images Using Adversarial
Deformation Fields [4.2739669051600275]
We propose a novel method for formulating and presenting spatial explanations of disease evidence.
An adversarially trained generator produces deformation fields that modify images of diseased patients to resemble images of healthy patients.
We validate the method studying chronic obstructive pulmonary disease (COPD) evidence in chest x-rays (CXRs) and Alzheimer's disease (AD) evidence in brain MRIs.
arXiv Detail & Related papers (2020-07-04T00:51:54Z) - Finding Patient Zero: Learning Contagion Source with Graph Neural
Networks [67.3415507211942]
Locating the source of an epidemic can provide critical insights into the infection's transmission course.
Existing methods use graph-theoretic measures and expensive message-passing algorithms.
We revisit this problem using graph neural networks (GNNs) to learn P0.
arXiv Detail & Related papers (2020-06-21T21:12:44Z) - Learning Dynamic and Personalized Comorbidity Networks from Event Data
using Deep Diffusion Processes [102.02672176520382]
Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals.
In electronic health records we can observe the different diseases a patient has, but can only infer the temporal relationship between each co-morbid condition.
We develop deep diffusion processes to model "dynamic comorbidity networks"
arXiv Detail & Related papers (2020-01-08T15:47:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.