Related papers: Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)

Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)

URL: http://arxiv.org/abs/2506.23315v1
Date: Sun, 29 Jun 2025 16:17:17 GMT
Title: Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)
Authors: Shouvon Sarker, Xishuang Dong, Lijun Qian,
Abstract summary: n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR)<n>This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model.<n>It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data.<n> Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and
Score: 1.5211498825393837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Identification of key variables such as medications, diseases, relations from health records and clinical notes has a wide range of applications in the clinical domain. n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR), where it built a comprehensive annotated clinical data Contextualized Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model. It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data. These fine-tuned BERT models were employed to accomplish medication event classification on CMED testing data with multiple predictions. These multiple predictions generated by these fine-tuned BERT models were integrated to build final prediction with voting strategies. Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and strict Macro-F score by about 6%, respectively.

Related papers

MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model [0.0]
MedicalBERT is a pretrained BERT model trained on a large biomedicaldataset.<n>It is equipped with domain-specific vocabulary that enhances thecomprehension of biomedical terminology.<n>MedicalBERT surpasses the general-purpose BERT model by5.67% on average across all the tasks evaluated.
arXiv Detail & Related papers (2025-07-06T03:38:05Z)
Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification. BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z)
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models. FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z)
MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models [11.798375238713488]
MEDFuse is a framework that integrates structured and unstructured medical data. It achieves over 90% F1 score in the 10-disease multi-label classification task.
arXiv Detail & Related papers (2024-07-17T04:17:09Z)
Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction [0.0]
We implement and evaluate several classification models, including a BERT base model, Bio+Clinical BERT, and a simpler CNN. Results indicate that the medical domain-specific Bio+Clinical BERT model significantly outperformed the general domain base BERT model. Future research could explore how to capitalize on the specific strengths of each model.
arXiv Detail & Related papers (2023-08-02T20:01:38Z)
Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification [2.980018103007841]
In the N2C2 2022 competitions, various tasks were presented to promote the identification of key factors in electronic health records. Pretrained large language models (LLMs) demonstrated exceptional performance in these tasks. This study aims to explore the utilization of LLMs, specifically ChatGPT, for data augmentation to overcome the limited availability of annotated data.
arXiv Detail & Related papers (2023-06-10T20:55:21Z)
Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z)
Unsupervised pre-training of graph transformers on patient population graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data. We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z)
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z)
An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.