Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs
- URL: http://arxiv.org/abs/2601.05847v1
- Date: Fri, 09 Jan 2026 15:20:11 GMT
- Title: Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs
- Authors: Rafael Brens, Yuqiao Meng, Luoxi Tang, Zhaohan Xi,
- Abstract summary: This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into digital twin representations.<n>The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations.
- Score: 3.914632811815449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generating interoperable patient digital twins from unstructured electronic health records (EHRs) remains challenging due to variability in clinical documentation and lack of standardized mappings. This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant digital twin representations. The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations. Evaluation on MIMIC-IV Clinical Database Demo with validation against MIMIC-IV-on-FHIR reference mappings demonstrates high F1-scores for entity and relation extraction, with improved schema completeness and interoperability compared to baseline methods.
Related papers
- Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z) - AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition [5.790213951638059]
We propose a novel zero-shot clinical NER framework based on multi-agent collaboration.<n>We show that OEMA achieves state-of-the-art performance under exact-match evaluation.<n>Future work will focus on continual learning and open-domain adaptation to expand its applicability in clinical NLP.
arXiv Detail & Related papers (2025-11-19T08:02:55Z) - SNOMED CT-powered Knowledge Graphs for Structured Clinical Data and Diagnostic Reasoning [10.805834750887966]
We present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph.<n>By extracting and standardizing entity-relationship pairs, we generate structured,formatted datasets that embed explicit diagnostic pathways.<n> Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning.
arXiv Detail & Related papers (2025-10-19T15:50:33Z) - A Semantic Framework for Patient Digital Twins in Chronic Care [0.0]
The Patient Medical Digital Twin (PMDT) integrates physiological, psychosocial, behavioral, and genomic information into a coherent model.<n>The PMDT ensures semantic interoperability, supports automated reasoning, and enables reuse across diverse clinical contexts.<n>By bridging gaps in data fragmentation and semantic standardization, the PMDT provides a validated foundation for next-generation digital health ecosystems.
arXiv Detail & Related papers (2025-10-10T08:34:55Z) - Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z) - Interpretable Clinical Classification with Kolgomorov-Arnold Networks [70.72819760172744]
Kolmogorov-Arnold Networks (KANs) offer intrinsic interpretability through transparent, symbolic representations.<n>KANs support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval.<n>These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon.
arXiv Detail & Related papers (2025-09-20T17:21:58Z) - Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks [0.31457219084519]
This study introduces a neural sequence labeling approach for SNOMED CT concept recognition using a Bidirectional GRU model.<n>We preprocess text with domain-adapted SpaCy and SciBERT-based tokenization, segmenting sentences into overlapping 19-token chunks enriched with contextual, syntactic, and morphological features.<n>The Bi-GRU model assigns IOB tags to identify concept spans and achieves strong performance with a 90 percent F1-score on the validation set.
arXiv Detail & Related papers (2025-08-04T16:08:49Z) - Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - A Joint Network Optimization Framework to Predict Clinical Severity from Resting State Functional MRI Data [3.276067241408604]
We propose a novel framework to predict clinical severity from resting state fMRI (rs-fMRI) data.
We validate our framework on two separate datasets in a ten fold cross validation setting.
arXiv Detail & Related papers (2020-08-27T23:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.