Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study
- URL: http://arxiv.org/abs/2507.19396v2
- Date: Mon, 28 Jul 2025 09:02:30 GMT
- Title: Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study
- Authors: Rachel M. Murphy, Nishant Mishra, Nicolette F. de Keizer, Dave A. Dongelmans, Kitty J. Jager, Ameen Abu-Hanna, Joanna E. Klopotowska, Iacer Calixto,
- Abstract summary: We establish a benchmark for adverse drug event (ADE) detection in Dutch clinical free-text documents.<n>We trained a Bidirectional Long Short-Term Memory (Bi-LSTM) model and four transformer-based Dutch and/or multilingual encoder models.<n>We evaluated our ADE RC models internally using the gold standard (two-step task) and predicted entities.
- Score: 2.6758306285161075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we establish a benchmark for adverse drug event (ADE) detection in Dutch clinical free-text documents using several transformer models, clinical scenarios, and fit-for-purpose performance measures. We trained a Bidirectional Long Short-Term Memory (Bi-LSTM) model and four transformer-based Dutch and/or multilingual encoder models (BERTje, RobBERT, MedRoBERTa(.)nl, and NuNER) for the tasks of named entity recognition (NER) and relation classification (RC) using 102 richly annotated Dutch ICU clinical progress notes. Anonymized free-text clinical progress notes of patients admitted to the intensive care unit (ICU) of one academic hospital and discharge letters of patients admitted to Internal Medicine wards of two non-academic hospitals were reused. We evaluated our ADE RC models internally using the gold standard (two-step task) and predicted entities (end-to-end task). In addition, all models were externally validated for detecting ADEs at the document level. We report both micro- and macro-averaged F1 scores, given the dataset imbalance in ADEs. Although differences for the ADE RC task between the models were small, MedRoBERTa(.)nl was the best performing model with a macro-averaged F1 score of 0.63 using the gold standard and 0.62 using predicted entities. The MedRoBERTa(.)nl models also performed the best in our external validation and achieved a recall of between 0.67 to 0.74 using predicted entities, meaning between 67 to 74% of discharge letters with ADEs were detected. Our benchmark study presents a robust and clinically meaningful approach for evaluating language models for ADE detection in clinical free-text documents. Our study highlights the need to use appropriate performance measures fit for the task of ADE detection in clinical free-text documents and envisioned future clinical use.
Related papers
- Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model [0.7373617024876725]
Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants.
The complex nature of unstructured medical texts presents challenges in efficiently identifying participants.
In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task.
arXiv Detail & Related papers (2024-04-24T20:42:28Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Leveraging deep active learning to identify low-resource mobility
functioning information in public clinical notes [0.157286095422595]
First public annotated dataset specifically on the Mobility domain of the International Classification of Functioning, Disability and Health (ICF)
We utilize the National NLP Clinical Challenges (n2c2) research dataset to construct a pool of candidate sentences using keyword expansion.
Our final dataset consists of 4,265 sentences with a total of 11,784 entities, including 5,511 Action entities, 5,328 Mobility entities, 306 Assistance entities, and 639 Quantification entities.
arXiv Detail & Related papers (2023-11-27T15:53:11Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Learning structures of the French clinical language:development and
validation of word embedding models using 21 million clinical reports from
electronic health records [2.5709272341038027]
Methods based on transfer learning using pre-trained language models have achieved state-of-the-art results in most NLP applications.
We aimed to evaluate the impact of adapting a language model to French clinical reports on downstream medical NLP tasks.
arXiv Detail & Related papers (2022-07-26T14:46:34Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Enhancing Clinical Information Extraction with Transferred Contextual
Embeddings [9.143551270841858]
Bidirectional Representations from Transformers (BERT) model has achieved the state-of-the-art performance for many natural language processing (NLP) tasks.
We show that BERT based pre-training models can be transferred to health-related documents under mild conditions.
arXiv Detail & Related papers (2021-09-15T12:22:57Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - The Medical Scribe: Corpus Development and Model Performance Analyses [19.837396601641117]
Motivated by this goal, we developed an annotation scheme to extract relevant clinical concepts.
We used this annotation scheme to label a corpus of about 6k clinical encounters.
This was used to train a state-of-the-art tagging model.
arXiv Detail & Related papers (2020-03-12T03:10:25Z) - Med7: a transferable clinical natural language processing model for
electronic health records [6.935142529928062]
We introduce a named-entity recognition model for clinical natural language processing.
The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration.
We evaluate the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK.
arXiv Detail & Related papers (2020-03-03T00:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.