Related papers: Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

URL: http://arxiv.org/abs/2311.10809v1
Date: Fri, 17 Nov 2023 18:09:21 GMT
Title: Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression
Authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang
Abstract summary: Two levels of complexity of regular expression (RE) methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation.
Score: 6.636721448099117
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes.

Related papers

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection [1.5391321019692428]
This research aims to integrate semantic features derived from radiologists' assessments of nodules, allowing the model to learn clinically relevant, robust, and explainable features for predicting lung cancer. We finetuned a pretrained Contrastive Language-Image Pretraining model with a parameter-efficient fine-tuning approach to align imaging and semantic features and predict the one-year lung cancer diagnosis. Our model demonstrated an AUROC of 0.90 and AUPRC of 0.78, outperforming baseline state-of-the-art models on external datasets.
arXiv Detail & Related papers (2025-04-30T06:11:34Z)
FACT: Foundation Model for Assessing Cancer Tissue Margins with Mass Spectrometry [1.0183055506531902]
FACT is an adaptation of a foundation model originally designed for text-audio association, pretrained using our proposed supervised contrastive approach based on triplet loss. Results: Our proposed model significantly improves the classification performance, achieving state-of-the-art performance with an AUROC of $82.4% pm 0.8$.
arXiv Detail & Related papers (2025-04-15T16:36:03Z)
RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore) RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z)
Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers [0.29530625605275984]
structured reporting (SR) has been recommended by various medical societies. We propose a pipeline to extract information from free-text reports. Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models.
arXiv Detail & Related papers (2024-03-27T18:38:39Z)
Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records [6.636721448099117]
The prompt generation by GPT-J models was utilized to test the gold standard and to generate the seed. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model.
arXiv Detail & Related papers (2023-11-17T18:14:08Z)
ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour Characterisation [2.755232740505053]
Self-supervised neural network for fitting VERDICT estimates parameter maps without training data. We compare the performance of ssVERDICT to two established baseline methods for fitting diffusion MRI models.
arXiv Detail & Related papers (2023-09-12T14:31:33Z)
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Prediction of drug effectiveness in rheumatoid arthritis patients based on machine learning algorithms [2.5759046095742453]
Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue. Machine learning (ML) has the potential to identify patterns in patient electronic health records to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (TNF) framework with two main goals: 1) design a data processing pipeline to extract information from clinical data, and then preprocess it for functional use, and 2) predict RA patient's responses to drugs and evaluate classification models' performance.
arXiv Detail & Related papers (2022-10-14T15:15:37Z)
Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes [9.49721872804122]
This paper evaluates how natural language processing can be used to identify the risk of acute care use (ACU) in oncology patients. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex.
arXiv Detail & Related papers (2022-09-28T06:31:19Z)
Improving Classification Model Performance on Chest X-Rays through Lung Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.