Related papers: A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

URL: http://arxiv.org/abs/2204.07056v1
Date: Fri, 25 Mar 2022 19:42:03 GMT
Title: A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data
Authors: Christopher Meaney, Wali Hakimpour, Sumeet Kalia, Rahim Moineddin
Abstract summary: The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes. We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Objective: To comparatively evaluate several transformer model architectures at identifying protected health information (PHI) in the i2b2/UTHealth 2014 clinical text de-identification challenge corpus. Methods: The i2b2/UTHealth 2014 corpus contains N=1304 clinical notes obtained from N=296 patients. Using a transfer learning framework, we fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. During fine-tuning we vary the following model hyper-parameters: batch size, number training epochs, learning rate and weight decay. We fine tune models on a training data set, we evaluate and select optimally performing models on an independent validation dataset, and lastly assess generalization performance on a held-out test dataset. We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score (harmonic mean of precision and recall). We are interested in overall model performance (PHI identified vs. PHI not identified), as well as PHI-specific model performance. Results: We observe that the ROBERTA-large models perform best at identifying PHI in the i2b2/UTHealth 2014 corpus, achieving >99% overall accuracy and 96.7% recall/precision on the heldout test corpus. Performance was good across many PHI classes; however, accuracy/precision/recall decreased for identification of the following entity classes: professions, organizations, ages, and certain locations. Conclusions: Transformers are a promising model class/architecture for clinical text de-identification. With minimal hyper-parameter tuning transformers afford researchers/clinicians the opportunity to obtain (near) state-of-the-art performance.

Related papers

Efficient Brain Tumor Classification with Lightweight CNN Architecture: A Novel Approach [0.0]
Brain tumor classification using MRI images is critical in medical diagnostics, where early and accurate detection significantly impacts patient outcomes. Recent advancements in deep learning (DL) have shown promise, but many models struggle with balancing accuracy and computational efficiency. We propose a novel model architecture integrating separable convolutions and squeeze and excitation (SE) blocks, designed to enhance feature extraction while maintaining computational efficiency.
arXiv Detail & Related papers (2025-02-01T21:06:42Z)
Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2. While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z)
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z)
Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials. This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Large Language Models to Identify Social Determinants of Health in Electronic Health Records [2.168737004368243]
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHRs) This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated.
arXiv Detail & Related papers (2023-08-11T19:18:35Z)
Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data. The results of our analysis demonstrate the performance of each model in terms of accuracy. The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Clinical Deterioration Prediction in Brazilian Hospitals Based on Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD) The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.