Related papers: LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs

LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs

URL: http://arxiv.org/abs/2505.08704v2
Date: Sun, 25 May 2025 21:34:53 GMT
Title: LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju,
Abstract summary: This paper explores prompt-based medical entity recognition using large language models (LLMs)<n>GPT-4o with prompt ensemble achieved the highest classification performance with an F1-score of 0.95 and recall of 0.98.<n>The ensemble method improved reliability by aggregating outputs through embedding-based similarity and majority voting.
Score: 4.262074310505135
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Electronic Health Records (EHRs) are digital records of patient information, often containing unstructured clinical text. Named Entity Recognition (NER) is essential in EHRs for extracting key medical entities like problems, tests, and treatments to support downstream clinical applications. This paper explores prompt-based medical entity recognition using large language models (LLMs), specifically GPT-4o and DeepSeek-R1, guided by various prompt engineering techniques, including zero-shot, few-shot, and an ensemble approach. Among all strategies, GPT-4o with prompt ensemble achieved the highest classification performance with an F1-score of 0.95 and recall of 0.98, outperforming DeepSeek-R1 on the task. The ensemble method improved reliability by aggregating outputs through embedding-based similarity and majority voting.

Related papers

ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts [0.055923945039144905]
We introduce ICU-TSB (Temporal Stratification Benchmark), the first benchmark for evaluating patient stratification based on temporal patient representation learning.<n>A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease to measure the alignment of discovered clusters with clinically validated disease groupings.<n>Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts.
arXiv Detail & Related papers (2025-06-06T15:52:50Z)
Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety [0.0]
Six semantic similarity measures (SSMs) were evaluated for clustering MedDRA Preferred Terms (PTs) in drug safety data.<n>We found that intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy.<n>Our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance by improving early signal detection and reducing manual review.
arXiv Detail & Related papers (2025-03-26T17:19:00Z)
Revisiting Medical Image Retrieval via Knowledge Consolidation [46.6989555659494]
We propose a novel method to consolidate knowledge of hierarchical features and functions.<n>We introduce Depth-aware Representation Fusion (DaRF) and Structure-aware Contrastive Hashing (SCH)<n>Our method achieves a 5.6-38.9% improvement in mean Average Precision on the anatomical radiology dataset.
arXiv Detail & Related papers (2025-03-12T13:16:42Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR [1.3810901729134184]
We introduce United-MedASR, a novel architecture that addresses challenges by integrating synthetic data generation, precision ASR fine-tuning, and semantic enhancement techniques.<n>United-MedASR constructs a specialised medical vocabulary by synthesising data from authoritative sources such as ICD-10, MIMS, and FDA databases.<n>To enhance processing speed, we incorporate Faster Whisper, ensuring streamlined and high-speed ASR performance.
arXiv Detail & Related papers (2024-11-24T17:02:48Z)
DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization [13.038800602897354]
We develop an adversarial approach using a large language model to re-identify the patient corresponding to a redacted clinical note. Our method uses a large language model to reidentify the patient corresponding to a redacted clinical note. Although ClinicalBERT was the most effective, masking all identified PII, our tool still reidentified 9% of clinical notes.
arXiv Detail & Related papers (2024-10-22T14:06:31Z)
LLMs in Biomedicine: A study on clinical Named Entity Recognition [42.71263594812782]
Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks. This paper investigates strategies to enhance their performance for the NER task. Our proposed method, DiRAG, can boost the zero-shot F1 score of LLMs for biomedical NER.
arXiv Detail & Related papers (2024-04-10T22:26:26Z)
Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data. We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
A Marker-based Neural Network System for Extracting Social Determinants of Health [12.6970199179668]
Social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically.
arXiv Detail & Related papers (2022-12-24T18:40:23Z)
Collaborative residual learners for automatic icd10 prediction using prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data. We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z)
DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.