A Comparative Evaluation Of Transformer Models For De-Identification Of
Clinical Text Data
- URL: http://arxiv.org/abs/2204.07056v1
- Date: Fri, 25 Mar 2022 19:42:03 GMT
- Title: A Comparative Evaluation Of Transformer Models For De-Identification Of
Clinical Text Data
- Authors: Christopher Meaney, Wali Hakimpour, Sumeet Kalia, Rahim Moineddin
- Abstract summary: The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes.
We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge.
We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: To comparatively evaluate several transformer model architectures
at identifying protected health information (PHI) in the i2b2/UTHealth 2014
clinical text de-identification challenge corpus.
Methods: The i2b2/UTHealth 2014 corpus contains N=1304 clinical notes
obtained from N=296 patients. Using a transfer learning framework, we fine-tune
several transformer model architectures on the corpus, including: BERT-base,
BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. During
fine-tuning we vary the following model hyper-parameters: batch size, number
training epochs, learning rate and weight decay. We fine tune models on a
training data set, we evaluate and select optimally performing models on an
independent validation dataset, and lastly assess generalization performance on
a held-out test dataset. We assess model performance in terms of accuracy,
precision (positive predictive value), recall (sensitivity) and F1 score
(harmonic mean of precision and recall). We are interested in overall model
performance (PHI identified vs. PHI not identified), as well as PHI-specific
model performance.
Results: We observe that the ROBERTA-large models perform best at identifying
PHI in the i2b2/UTHealth 2014 corpus, achieving >99% overall accuracy and 96.7%
recall/precision on the heldout test corpus. Performance was good across many
PHI classes; however, accuracy/precision/recall decreased for identification of
the following entity classes: professions, organizations, ages, and certain
locations.
Conclusions: Transformers are a promising model class/architecture for
clinical text de-identification. With minimal hyper-parameter tuning
transformers afford researchers/clinicians the opportunity to obtain (near)
state-of-the-art performance.
Related papers
- Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2.
While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z) - Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease.
MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Large Language Models to Identify Social Determinants of Health in
Electronic Health Records [2.168737004368243]
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHRs)
This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented.
800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated.
arXiv Detail & Related papers (2023-08-11T19:18:35Z) - Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse
Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data.
The results of our analysis demonstrate the performance of each model in terms of accuracy.
The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.