Extracting periodontitis diagnosis in clinical notes with RoBERTa and
regular expression
- URL: http://arxiv.org/abs/2311.10809v1
- Date: Fri, 17 Nov 2023 18:09:21 GMT
- Title: Extracting periodontitis diagnosis in clinical notes with RoBERTa and
regular expression
- Authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran,
Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang
- Abstract summary: Two levels of complexity of regular expression (RE) methods were used to extract and generate the training data.
The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards.
The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation.
- Score: 6.636721448099117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study aimed to utilize text processing and natural language processing
(NLP) models to mine clinical notes for the diagnosis of periodontitis and to
evaluate the performance of a named entity recognition (NER) model on different
regular expression (RE) methods. Two complexity levels of RE methods were used
to extract and generate the training data. The SpaCy package and RoBERTa
transformer models were used to build the NER model and evaluate its
performance with the manual-labeled gold standards. The comparison of the RE
methods with the gold standard showed that as the complexity increased in the
RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER
models demonstrated excellent predictions, with the simple RE method showing
0.84-0.92 in the evaluation metrics, and the advanced and combined RE method
demonstrating 0.95-0.99 in the evaluation. This study provided an example of
the benefit of combining NER methods and NLP models in extracting target
information from free-text to structured data and fulfilling the need for
missing diagnoses from unstructured notes.
Related papers
- RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers [0.29530625605275984]
structured reporting (SR) has been recommended by various medical societies.
We propose a pipeline to extract information from free-text reports.
Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models.
arXiv Detail & Related papers (2024-03-27T18:38:39Z) - Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis
Extraction of Periodontal Diagnosis from Electronic Dental Records [6.636721448099117]
The prompt generation by GPT-J models was utilized to test the gold standard and to generate the seed.
The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model.
arXiv Detail & Related papers (2023-11-17T18:14:08Z) - ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour
Characterisation [2.755232740505053]
Self-supervised neural network for fitting VERDICT estimates parameter maps without training data.
We compare the performance of ssVERDICT to two established baseline methods for fitting diffusion MRI models.
arXiv Detail & Related papers (2023-09-12T14:31:33Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Prediction of drug effectiveness in rheumatoid arthritis patients based
on machine learning algorithms [2.5759046095742453]
Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue.
Machine learning (ML) has the potential to identify patterns in patient electronic health records to forecast the best clinical treatment to improve patient outcomes.
This study introduced a Drug Response Prediction (TNF) framework with two main goals: 1) design a data processing pipeline to extract information from clinical data, and then preprocess it for functional use, and 2) predict RA patient's responses to drugs and evaluate classification models' performance.
arXiv Detail & Related papers (2022-10-14T15:15:37Z) - Natural Language Processing Methods to Identify Oncology Patients at
High Risk for Acute Care with Clinical Notes [9.49721872804122]
This paper evaluates how natural language processing can be used to identify the risk of acute care use (ACU) in oncology patients.
Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex.
arXiv Detail & Related papers (2022-09-28T06:31:19Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.