Related papers: Leveraging Foundation Models for Clinical Text Analysis

Leveraging Foundation Models for Clinical Text Analysis

URL: http://arxiv.org/abs/2303.13314v1
Date: Mon, 20 Mar 2023 17:05:13 GMT
Title: Leveraging Foundation Models for Clinical Text Analysis
Authors: Shaina Raza and Syed Raza Bashir
Abstract summary: Infectious diseases are a significant public health concern globally. The large amount of clinical data available presents a challenge for information extraction. This study proposes a natural language processing (NLP) framework that uses a pre-trained transformer model fine-tuned on task-specific data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Infectious diseases are a significant public health concern globally, and extracting relevant information from scientific literature can facilitate the development of effective prevention and treatment strategies. However, the large amount of clinical data available presents a challenge for information extraction. To address this challenge, this study proposes a natural language processing (NLP) framework that uses a pre-trained transformer model fine-tuned on task-specific data to extract key information related to infectious diseases from free-text clinical data. The proposed framework includes three components: a data layer for preparing datasets from clinical texts, a foundation model layer for entity extraction, and an assessment layer for performance analysis. The results of the evaluation indicate that the proposed method outperforms standard methods, and leveraging prior knowledge through the pre-trained transformer model makes it useful for investigating other infectious diseases in the future.

Related papers

Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z)
Patient Trajectory Prediction: Integrating Clinical Notes with Transformers [0.0]
We propose an approach that integrates unstructured clinical notes into transformer-based deep learning models for sequential disease prediction. Experiments on MIMIC-IV datasets demonstrate that the proposed approach outperforms traditional models relying solely on structured data.
arXiv Detail & Related papers (2025-02-25T09:14:07Z)
Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models [8.798959872821962]
This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model. We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications.
arXiv Detail & Related papers (2024-07-20T18:34:20Z)
Leveraging text data for causal inference using electronic health records [1.4182510510164876]
This paper presents a unified framework for leveraging text data to support causal inference with electronic health data. We show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect. We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited.
arXiv Detail & Related papers (2023-06-09T16:06:02Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets. We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z)
Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records. We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data. We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z)
Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims [54.93898455714295]
We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19. We then propose novel techniques for automated veracity assessment based on Natural Language Inference.
arXiv Detail & Related papers (2022-05-05T12:11:31Z)
Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis. We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text. Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units. The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.