Related papers: Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support

Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support

URL: http://arxiv.org/abs/2506.06340v1
Date: Sun, 01 Jun 2025 05:07:51 GMT
Title: Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support
Authors: Wu Hao Ran, Xi Xi, Furong Li, Jingyi Lu, Jian Jiang, Hui Huang, Yuzhuan Zhang, Shi Li,
Abstract summary: This paper explores the application of advanced language models to leverage diverse data sources for improved clinical decision support.<n>We will discuss how text-based features, often overlooked in traditional high dimensional EHR analysis, can provide semantically rich representations.
Score: 11.927390747231588
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of large language models (LLMs) has opened new avenues for analyzing complex, unstructured data, particularly within the medical domain. Electronic Health Records (EHRs) contain a wealth of information in various formats, including free text clinical notes, structured lab results, and diagnostic codes. This paper explores the application of advanced language models to leverage these diverse data sources for improved clinical decision support. We will discuss how text-based features, often overlooked in traditional high dimensional EHR analysis, can provide semantically rich representations and aid in harmonizing data across different institutions. Furthermore, we delve into the challenges and opportunities of incorporating medical codes and ensuring the generalizability and fairness of AI models in healthcare.

Related papers

A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models [5.623574322477982]
This survey offers a comprehensive overview of recent advancements at the intersection of deep learning, large language models (LLMs), and EHR modeling.<n>We introduce a unified taxonomy that spans five key design dimensions: data-centric approaches, neural architecture design, learning-focused strategies, multimodal learning, and LLM-based modeling systems.<n>This survey aims to provide a structured roadmap for advancing AI-driven EHR modeling and clinical decision support.
arXiv Detail & Related papers (2025-07-17T04:31:55Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond [68.47402386668846]
We introduce Structured Reasoning In Critical Text Assessment (STRICTA) to model text assessment as an explicit, step-wise reasoning process.<n>STRICTA breaks down the assessment into a graph of interconnected reasoning steps drawing on causality theory.<n>We apply STRICTA to a dataset of over 4000 reasoning steps from roughly 40 biomedical experts on more than 20 papers.
arXiv Detail & Related papers (2024-09-09T06:55:37Z)
Clinical Insights: A Comprehensive Review of Language Models in Medicine [1.5020330976600738]
This paper explores the advancements and applications of language models in healthcare, focusing on their clinical use cases.<n>It examines the evolution from early encoder-based systems requiring extensive fine-tuning to state-of-the-art large language and multimodal models capable of integrating text and visual data through in-context learning.<n>The analysis emphasizes locally deployable models, which enhance data privacy and operational autonomy, and their applications in tasks such as text generation, classification, information extraction, and conversational systems.
arXiv Detail & Related papers (2024-08-21T15:59:33Z)
A Hybrid Framework with Large Language Models for Rare Disease Phenotyping [4.550497164299771]
Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary.
arXiv Detail & Related papers (2024-05-16T20:59:28Z)
Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM [39.25272553560425]
We propose a new framework called MINGLE, which integrates both structures and semantics in EHR effectively. Our framework uses a two-level infusion strategy to combine medical concept semantics and clinical note semantics into hypergraph neural networks. Experiment results on two EHR datasets, the public MIMIC-III and private CRADLE, show that MINGLE can effectively improve predictive performance by 11.83% relatively.
arXiv Detail & Related papers (2024-02-19T23:48:40Z)
On Preserving the Knowledge of Long Clinical Texts [0.0]
A bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models.<n>This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders.
arXiv Detail & Related papers (2023-11-02T19:50:02Z)
Leveraging text data for causal inference using electronic health records [1.4182510510164876]
This paper presents a unified framework for leveraging text data to support causal inference with electronic health data. We show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect. We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited.
arXiv Detail & Related papers (2023-06-09T16:06:02Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)
A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding. These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information. Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
Learning Contextualized Document Representations for Healthcare Answer Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.