Leveraging Foundation Models for Clinical Text Analysis
- URL: http://arxiv.org/abs/2303.13314v1
- Date: Mon, 20 Mar 2023 17:05:13 GMT
- Title: Leveraging Foundation Models for Clinical Text Analysis
- Authors: Shaina Raza and Syed Raza Bashir
- Abstract summary: Infectious diseases are a significant public health concern globally.
The large amount of clinical data available presents a challenge for information extraction.
This study proposes a natural language processing (NLP) framework that uses a pre-trained transformer model fine-tuned on task-specific data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Infectious diseases are a significant public health concern globally, and
extracting relevant information from scientific literature can facilitate the
development of effective prevention and treatment strategies. However, the
large amount of clinical data available presents a challenge for information
extraction. To address this challenge, this study proposes a natural language
processing (NLP) framework that uses a pre-trained transformer model fine-tuned
on task-specific data to extract key information related to infectious diseases
from free-text clinical data. The proposed framework includes three components:
a data layer for preparing datasets from clinical texts, a foundation model
layer for entity extraction, and an assessment layer for performance analysis.
The results of the evaluation indicate that the proposed method outperforms
standard methods, and leveraging prior knowledge through the pre-trained
transformer model makes it useful for investigating other infectious diseases
in the future.
Related papers
- A Multi-Dataset Classification-Based Deep Learning Framework for Electronic Health Records and Predictive Analysis in Healthcare [0.5999777817331317]
This study proposes a novel deep learning predictive analysis framework for classifying multiple datasets.
A hybrid deep learning model combining Residual Networks and Artificial Neural Networks is proposed to detect acute and chronic diseases.
Rigorous experimentation and evaluation resulted in high accuracies of 93%, 99%, and 95% for retinal fundus images, cirrhosis stages, and heart disease diagnostic predictions, respectively.
arXiv Detail & Related papers (2024-09-25T08:13:39Z) - Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models [8.798959872821962]
This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model.
We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications.
arXiv Detail & Related papers (2024-07-20T18:34:20Z) - Leveraging text data for causal inference using electronic health records [1.4182510510164876]
This paper presents a unified framework for leveraging text data to support causal inference with electronic health data.
We show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect.
We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited.
arXiv Detail & Related papers (2023-06-09T16:06:02Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Natural Language Inference with Self-Attention for Veracity Assessment
of Pandemic Claims [54.93898455714295]
We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19.
We then propose novel techniques for automated veracity assessment based on Natural Language Inference.
arXiv Detail & Related papers (2022-05-05T12:11:31Z) - Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis.
We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text.
Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.