Data augmentation method for modeling health records with applications
to clopidogrel treatment failure detection
- URL: http://arxiv.org/abs/2402.18046v1
- Date: Wed, 28 Feb 2024 04:47:32 GMT
- Title: Data augmentation method for modeling health records with applications
to clopidogrel treatment failure detection
- Authors: Sunwoong Choi and Samuel Kim
- Abstract summary: The proposed method generates augmented data by rearranging the orders of medical records within a visit.
Applying the proposed method to the clopidogrel treatment failure detection task enabled up to 5.3% absolute improvement in terms of ROC-AUC.
- Score: 0.5957022371135096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel data augmentation method to address the challenge of data
scarcity in modeling longitudinal patterns in Electronic Health Records (EHR)
of patients using natural language processing (NLP) algorithms. The proposed
method generates augmented data by rearranging the orders of medical records
within a visit where the order of elements are not obvious, if any. Applying
the proposed method to the clopidogrel treatment failure detection task enabled
up to 5.3% absolute improvement in terms of ROC-AUC (from 0.908 without
augmentation to 0.961 with augmentation) when it was used during the
pre-training procedure. It was also shown that the augmentation helped to
improve performance during fine-tuning procedures, especially when the amount
of labeled training data is limited.
Related papers
- X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data [86.52299247918637]
Long-tailed pulmonary anomalies in chest radiography present formidable diagnostic challenges.<n>Despite the recent strides in diffusion-based methods for enhancing the representation of tailed lesions, the paucity of rare lesion exemplars curtails the generative capabilities of these approaches.<n>We propose a novel data synthesis pipeline designed to augment tail lesions utilizing a copious supply of conventional normal X-rays.
arXiv Detail & Related papers (2025-12-24T06:14:55Z) - Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records [0.6875312133832079]
Current medical practice depends on standardized treatment frameworks and empirical methodologies that neglect individual patient variations.<n>We develop a comprehensive system integrating Large Language Models (LLMs), Conditional Tabular Generative Adversarial Networks (CTGAN), T-learner counterfactual models, and contextual bandit approaches.
arXiv Detail & Related papers (2025-10-21T18:57:00Z) - Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading [53.11883409422728]
This work proposes a novel autoregressive ordinal regression method called AOR-DR.<n>We decompose the diabetic retinopathy grading task into a series of ordered steps by fusing the prediction of the previous steps with extracted image features.<n>We exploit the diffusion process to facilitate conditional probability modeling, enabling the direct use of continuous global image features for autoregression.
arXiv Detail & Related papers (2025-07-07T13:22:35Z) - Augmentation of EEG and ECG Time Series for Deep Learning Applications: Integrating Changepoint Detection into the iAAFT Surrogates [15.377534937558744]
We introduce a novel method for augmenting nonstationary time series.
This is achieved by combining offline changepoint detection with the iterative amplitude-adjusted Fourier transform (iAAFT)
For the CHB-MIT and Siena datasets respectively, accuracy rose by 4.4% and 1.9%, precision by 10% and 5.5%, recall by 3.6% and 0.9%, and F1 by 4.2% and 1.4%.
arXiv Detail & Related papers (2025-04-02T09:40:04Z) - Improving EEG Classification Through Randomly Reassembling Original and Generated Data with Transformer-based Diffusion Models [12.703528969668062]
We propose a Transformer-based denoising diffusion probabilistic model and a generated data-based augmentation method.
For the characteristics of EEG signals, we propose a constant-factor scaling method to preprocess the signals, which reduces the loss of information.
The proposed augmentation method randomly reassembles the generated data with original data in the time-domain to obtain vicinal data.
arXiv Detail & Related papers (2024-07-20T06:58:14Z) - Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO.
Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically.
Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Data Augmentation for Seizure Prediction with Generative Diffusion Model [26.967247641926814]
Seizure prediction is of great importance to improve the life of patients.
The severe imbalance problem between preictal and interictal data still poses a great challenge.
Data augmentation is an intuitive way to solve this problem.
We propose a novel data augmentation method with diffusion model called DiffEEG.
arXiv Detail & Related papers (2023-06-14T05:44:53Z) - Conditional Generative Data Augmentation for Clinical Audio Datasets [36.45569352490318]
We propose a novel data augmentation method for clinical audio datasets based on a conditional Wasserstein Generative Adversarial Network with Gradient Penalty.
To validate our method, we created a clinical audio dataset which was recorded in a real-world operating room during Total Hipplasty (THA) procedures.
We show that training with the generated augmented samples outperforms classical audio augmentation methods in terms of classification accuracy.
arXiv Detail & Related papers (2022-03-22T09:47:31Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example [50.591267188664666]
The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate.
The proposed random shuffle method was able to enhance the HF dataset cardinality circa 10 times and circa 21 times when followed by a random repeated-measures approach.
arXiv Detail & Related papers (2020-12-12T10:59:38Z) - Longitudinal modeling of MS patient trajectories improves predictions of
disability progression [2.117653457384462]
This work addresses the task of optimally extracting information from longitudinal patient data in the real-world setting.
We show that with machine learning methods suited for patient trajectories modeling, we can predict disability progression of patients in a two-year horizon.
Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.
arXiv Detail & Related papers (2020-11-09T20:48:00Z) - Automatic Data Augmentation via Deep Reinforcement Learning for
Effective Kidney Tumor Segmentation [57.78765460295249]
We develop a novel automatic learning-based data augmentation method for medical image segmentation.
In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an end-to-end training manner with a consistent loss.
We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.
arXiv Detail & Related papers (2020-02-22T14:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.