Clinical Data Goes MEDS? Let's OWL make sense of it
- URL: http://arxiv.org/abs/2601.04164v1
- Date: Wed, 07 Jan 2026 18:25:02 GMT
- Title: Clinical Data Goes MEDS? Let's OWL make sense of it
- Authors: Alberto Marfoglia, Jong Ho Jhee, Adrien Coulet,
- Abstract summary: The application of machine learning on healthcare data is often hindered by the lack of standardized and semantically explicit representation.<n>The Medical Event Data Standard (MEDS) addresses these issues by introducing a minimal, event-centric data model.<n>We introduce MEDS-OWL, a lightweight ontology that provides formal concepts and relations to enable representing MEDS datasets as RDF graphs.
- Score: 0.3441021278275805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The application of machine learning on healthcare data is often hindered by the lack of standardized and semantically explicit representation, leading to limited interoperability and reproducibility across datasets and experiments. The Medical Event Data Standard (MEDS) addresses these issues by introducing a minimal, event-centric data model designed for reproducible machine-learning workflows from health data. However, MEDS is defined as a data-format specification and does not natively provide integration with the Semantic Web ecosystem. In this article, we introduce MEDS-OWL, a lightweight OWL ontology that provides formal concepts and relations to enable representing MEDS datasets as RDF graphs. Additionally, we implemented meds2rdf, a Python conversion library that transforms MEDS events into RDF graphs, ensuring conformance with the ontology. We demonstrate the approach on a synthetic clinical dataset that describes patient care pathways for ruptured intracranial aneurysms and validate the resulting graph using SHACL constraints. The first release of MEDS-OWL comprises 13 classes, 10 object properties, 20 data properties, and 24 OWL axioms. Combined with meds2rdf, it enables data transformation into FAIR-aligned datasets, provenance-aware publishing, and interoperability of event-based clinical data. By bridging MEDS with the Semantic Web, this work contributes a reusable semantic layer for event-based clinical data and establishes a robust foundation for subsequent graph-based analytics.
Related papers
- A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science [3.4325249294405555]
This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks.<n>We test the ability of LLMs to interact accurately with large structured datasets for analytics.<n>We present a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task.
arXiv Detail & Related papers (2026-01-28T14:57:36Z) - SIDEKICK: A Semantically Integrated Resource for Drug Effects, Indications, and Contraindications [11.439066289590878]
Sidekick is a knowledge graph that standardizes drug indications, contraindications, and adverse reactions from FDA Structured Product Labels.<n>We processed over 50,000 drug labels and mapped terms to the Human Phenotype Ontology (HPO), the MONDO Disease Ontology, and RxNorm.<n>Sidekick enables automated safety-based similarity analysis for drug repurposing.
arXiv Detail & Related papers (2025-12-06T17:35:07Z) - Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z) - Patient Trajectory Prediction: Integrating Clinical Notes with Transformers [0.0]
We propose an approach that integrates unstructured clinical notes into transformer-based deep learning models for sequential disease prediction.<n> Experiments on MIMIC-IV datasets demonstrate that the proposed approach outperforms traditional models relying solely on structured data.
arXiv Detail & Related papers (2025-02-25T09:14:07Z) - Representation Learning of Lab Values via Masked AutoEncoders [2.785172582119726]
We propose Lab-MAE, a novel transformer-based masked autoencoder framework for imputation of sequential lab values.<n>Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions.
arXiv Detail & Related papers (2025-01-05T20:26:49Z) - ACES: Automatic Cohort Extraction System for Event-Stream Datasets [1.9338569571933975]
Reproducibility remains a significant challenge in machine learning (ML) for healthcare.<n>We introduce the Automatic Cohort Extraction System (ACES) for event-stream data.<n>ACES has the potential to significantly lower the barrier to entry for defining ML tasks in representation.
arXiv Detail & Related papers (2024-06-28T04:48:05Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.