DALL-M: Context-Aware Clinical Data Augmentation with LLMs
- URL: http://arxiv.org/abs/2407.08227v1
- Date: Thu, 11 Jul 2024 07:01:50 GMT
- Title: DALL-M: Context-Aware Clinical Data Augmentation with LLMs
- Authors: Chihcheng Hsieh, Catarina Moreira, Isabel Blanco Nobre, Sandra Costa Sousa, Chun Ouyang, Margot Brereton, Joaquim Jorge, Jacinto C. Nascimento,
- Abstract summary: We present a novel technique to enhance the clinical context through augmentation techniques with clinical data.
We introduce a pioneering approach to clinical data augmentation that employs large language models (LLMs) to generate patient contextual synthetic data.
This methodology is crucial for training more robust deep learning models in healthcare.
- Score: 13.827368628263997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating comprehensive clinical features and data integration. We present a novel technique to enhance the clinical context through augmentation techniques with clinical tabular data, thereby improving its applicability and reliability in AI medical diagnostics. To address this, we introduce a pioneering approach to clinical data augmentation that employs large language models (LLMs) to generate patient contextual synthetic data. This methodology is crucial for training more robust deep learning models in healthcare. It preserves the integrity of real patient data while enriching the dataset with contextually relevant synthetic features, significantly enhancing model performance. DALL-M uses a three-phase feature generation process: (i) clinical context storage, (ii) expert query generation, and (iii) context-aware feature augmentation. DALL-M generates new, clinically relevant features by synthesizing chest X-ray images and reports. Applied to 799 cases using nine features from the MIMIC-IV dataset, it created an augmented set of 91 features. This is the first work to generate contextual values for existing and new features based on patients' X-ray reports, gender, and age and to produce new contextual knowledge during data augmentation. Empirical validation with machine learning models, including Decision Trees, Random Forests, XGBoost, and TabNET, showed significant performance improvements. Incorporating augmented features increased the F1 score by 16.5% and Precision and Recall by approximately 25%. DALL-M addresses a critical gap in clinical data augmentation, offering a robust framework for generating contextually enriched datasets.
Related papers
- Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges [2.1835659964186087]
This paper presents a systematic review of generative models used to synthesize various medical data types.
Our study encompasses a broad array of medical data modalities and explores various generative models.
arXiv Detail & Related papers (2024-06-27T14:00:11Z) - The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It [12.61239008314719]
This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation.
Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as a vital signsperiodic, medications, and clinical history to enhance diagnostic accuracy.
arXiv Detail & Related papers (2024-06-19T03:25:31Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Knowledge Graph Representations to enhance Intensive Care Time-Series
Predictions [4.660203987415476]
Our proposed methodology integrates medical knowledge with ICU data, improving clinical decision modeling.
It combines graph representations with vital signs and clinical reports, enhancing performance.
Our model includes an interpretability component to understand how knowledge graph nodes affect predictions.
arXiv Detail & Related papers (2023-11-13T09:11:55Z) - TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence
Generation with Biomedical Language Models [22.046231408373522]
We present TRIALSCOPE, a unifying framework for distilling real-world evidence from observational data.
We show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials.
arXiv Detail & Related papers (2023-11-02T15:15:47Z) - Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models [48.07083163501746]
Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
arXiv Detail & Related papers (2023-11-01T04:37:28Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - MDF-Net for abnormality detection by fusing X-rays with clinical data [14.347359031598813]
This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-rays.
We propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data and chest X-rays.
Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12% in terms of Average Precision.
arXiv Detail & Related papers (2023-02-26T19:16:57Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.