Semantic Analysis of SNOMED CT Concept Co-occurrences in Clinical Documentation using MIMIC-IV
- URL: http://arxiv.org/abs/2509.03662v1
- Date: Wed, 03 Sep 2025 19:25:14 GMT
- Title: Semantic Analysis of SNOMED CT Concept Co-occurrences in Clinical Documentation using MIMIC-IV
- Authors: Ali Noori, Somya Mohanty, Prashanti Manda,
- Abstract summary: We investigate the relationship between SNOMED CT concept co-occurrence patterns and embedding-based semantic similarity.<n>Our analyses reveal that while co-occurrence and semantic similarity are weakly correlated, embeddings capture clinically meaningful associations.
- Score: 0.10499611180329803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical notes contain rich clinical narratives but their unstructured format poses challenges for large-scale analysis. Standardized terminologies such as SNOMED CT improve interoperability, yet understanding how concepts relate through co-occurrence and semantic similarity remains underexplored. In this study, we leverage the MIMIC-IV database to investigate the relationship between SNOMED CT concept co-occurrence patterns and embedding-based semantic similarity. Using Normalized Pointwise Mutual Information (NPMI) and pretrained embeddings (e.g., ClinicalBERT, BioBERT), we examine whether frequently co-occurring concepts are also semantically close, whether embeddings can suggest missing concepts, and how these relationships evolve temporally and across specialties. Our analyses reveal that while co-occurrence and semantic similarity are weakly correlated, embeddings capture clinically meaningful associations not always reflected in documentation frequency. Embedding-based suggestions frequently matched concepts later documented, supporting their utility for augmenting clinical annotations. Clustering of concept embeddings yielded coherent clinical themes (symptoms, labs, diagnoses, cardiovascular conditions) that map to patient phenotypes and care patterns. Finally, co-occurrence patterns linked to outcomes such as mortality and readmission demonstrate the practical utility of this approach. Collectively, our findings highlight the complementary value of co-occurrence statistics and semantic embeddings in improving documentation completeness, uncovering latent clinical relationships, and informing decision support and phenotyping applications.
Related papers
- Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs [3.914632811815449]
This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into digital twin representations.<n>The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations.
arXiv Detail & Related papers (2026-01-09T15:20:11Z) - SNOMED CT-powered Knowledge Graphs for Structured Clinical Data and Diagnostic Reasoning [10.805834750887966]
We present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph.<n>By extracting and standardizing entity-relationship pairs, we generate structured,formatted datasets that embed explicit diagnostic pathways.<n> Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning.
arXiv Detail & Related papers (2025-10-19T15:50:33Z) - KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings [0.555923706082834]
KEEP (Knowledge preserving and Empirically refined Embedding Process) is an efficient framework that combines knowledge graph embeddings with adaptive learning from clinical data.<n>We show KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes.
arXiv Detail & Related papers (2025-10-06T17:27:54Z) - Interpretable Clinical Classification with Kolgomorov-Arnold Networks [70.72819760172744]
Kolmogorov-Arnold Networks (KANs) offer intrinsic interpretability through transparent, symbolic representations.<n>KANs support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval.<n>These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon.
arXiv Detail & Related papers (2025-09-20T17:21:58Z) - Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks [0.31457219084519]
This study introduces a neural sequence labeling approach for SNOMED CT concept recognition using a Bidirectional GRU model.<n>We preprocess text with domain-adapted SpaCy and SciBERT-based tokenization, segmenting sentences into overlapping 19-token chunks enriched with contextual, syntactic, and morphological features.<n>The Bi-GRU model assigns IOB tags to identify concept spans and achieves strong performance with a 90 percent F1-score on the validation set.
arXiv Detail & Related papers (2025-08-04T16:08:49Z) - CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis [50.56875995511431]
We introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data.<n>Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings.
arXiv Detail & Related papers (2024-11-01T15:54:07Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques [0.0]
Multiple terms can refer to the same core concepts which can be referred as a clinical entity.
Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities.
We propose a suite of context-based and context-less remention techniques for performing the entity disambiguation.
arXiv Detail & Related papers (2024-05-24T01:14:33Z) - CoRelation: Boosting Automatic ICD Coding Through Contextualized Code
Relation Learning [56.782963838838036]
We propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations.
Our approach employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations.
arXiv Detail & Related papers (2024-02-24T03:25:28Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Enriching Unsupervised User Embedding via Medical Concepts [51.17532619610099]
Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories.
We propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora.
arXiv Detail & Related papers (2022-03-20T18:54:05Z) - MIMICause : Defining, identifying and predicting types of causal
relationships between biomedical concepts from clinical notes [0.0]
We propose annotation guidelines, develop an annotated corpus and provide baseline scores to identify types and direction of causal relations between a pair of biomedical concepts in clinical notes.
We annotate a total of 2714 de-identified examples sampled from the 2018 n2c2 shared task dataset and train four different language model based architectures.
The high inter-annotator agreement for clinical text shows the quality of our annotation guidelines while the provided baseline F1 score sets the direction for future research towards understanding narratives in clinical texts.
arXiv Detail & Related papers (2021-10-14T00:15:36Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Biomedical Concept Relatedness -- A large EHR-based benchmark [10.133874724214984]
A promising application of AI to healthcare is the retrieval of information from electronic health records.
The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores.
All existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs.
We open-source a novel concept relatedness benchmark overcoming these issues.
arXiv Detail & Related papers (2020-10-30T12:20:18Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.