KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings
- URL: http://arxiv.org/abs/2510.05049v1
- Date: Mon, 06 Oct 2025 17:27:54 GMT
- Title: KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings
- Authors: Ahmed Elhussein, Paul Meddeb, Abigail Newbury, Jeanne Mirone, Martin Stoll, Gamze Gursoy,
- Abstract summary: KEEP (Knowledge preserving and Empirically refined Embedding Process) is an efficient framework that combines knowledge graph embeddings with adaptive learning from clinical data.<n>We show KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes.
- Score: 0.555923706082834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade off: knowledge graph based approaches capture formal relationships but miss real world patterns, while data driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task specific auxiliary or end to end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC IV demonstrate that KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource constrained environments.
Related papers
- Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes [13.564947974902429]
We propose CLOZE, a novel framework that uses large language models (LLMs) to automatically extract medical entities from clinical notes.<n>By capitalizing on the strong language understanding and extensive knowledge of pre-trained LLMs, CLOZE effectively identifies disease-related concepts and captures complex hierarchical relationships.
arXiv Detail & Related papers (2025-11-20T17:00:46Z) - Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - SNOMED CT-powered Knowledge Graphs for Structured Clinical Data and Diagnostic Reasoning [10.805834750887966]
We present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph.<n>By extracting and standardizing entity-relationship pairs, we generate structured,formatted datasets that embed explicit diagnostic pathways.<n> Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning.
arXiv Detail & Related papers (2025-10-19T15:50:33Z) - Automated Hierarchical Graph Construction for Multi-source Electronic Health Records [17.122817545326928]
We propose MASH, a fully automated framework that aligns medical codes across institutions using neural optimal transport.<n>MASH integrates information from pre-trained language models, co-occurrence patterns, textual descriptions, and supervised labels.<n>It produces interpretable hierarchical graphs that facilitate the navigation and understanding of heterogeneous clinical data.
arXiv Detail & Related papers (2025-09-08T11:45:59Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation [22.94521527609479]
EMERGE is a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR predictive modeling.<n>We extract entities from time-series data and clinical notes by prompting Large Language Models (LLMs) and align them with professional PrimeKG.<n>The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses.
arXiv Detail & Related papers (2024-05-27T10:53:15Z) - REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records
Analysis via Large Language Models [19.62552013839689]
Existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge.
We propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations.
Our experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines.
arXiv Detail & Related papers (2024-02-10T18:27:28Z) - Knowledge Graph Representations to enhance Intensive Care Time-Series
Predictions [4.660203987415476]
Our proposed methodology integrates medical knowledge with ICU data, improving clinical decision modeling.
It combines graph representations with vital signs and clinical reports, enhancing performance.
Our model includes an interpretability component to understand how knowledge graph nodes affect predictions.
arXiv Detail & Related papers (2023-11-13T09:11:55Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.