Binary Gaussian Copula Synthesis: A Novel Data Augmentation Technique to
Advance ML-based Clinical Decision Support Systems for Early Prediction of
Dialysis Among CKD Patients
- URL: http://arxiv.org/abs/2403.00965v1
- Date: Fri, 1 Mar 2024 20:32:17 GMT
- Title: Binary Gaussian Copula Synthesis: A Novel Data Augmentation Technique to
Advance ML-based Clinical Decision Support Systems for Early Prediction of
Dialysis Among CKD Patients
- Authors: Hamed Khosravi, Srinjoy Das, Abdullah Al-Mamun, Imtiaz Ahmed
- Abstract summary: The Center for Disease Control estimates that over 37 million US adults suffer from chronic kidney disease (CKD)
9 out of 10 of these individuals are unaware of their condition due to the absence of symptoms in the early stages.
Early prediction of dialysis is crucial as it can significantly improve patient outcomes.
- Score: 4.80104397397529
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Center for Disease Control estimates that over 37 million US adults
suffer from chronic kidney disease (CKD), yet 9 out of 10 of these individuals
are unaware of their condition due to the absence of symptoms in the early
stages. It has a significant impact on patients' quality of life, particularly
when it progresses to the need for dialysis. Early prediction of dialysis is
crucial as it can significantly improve patient outcomes and assist healthcare
providers in making timely and informed decisions. However, developing an
effective machine learning (ML)-based Clinical Decision Support System (CDSS)
for early dialysis prediction poses a key challenge due to the imbalanced
nature of data. To address this challenge, this study evaluates various data
augmentation techniques to understand their effectiveness on real-world
datasets. We propose a new approach named Binary Gaussian Copula Synthesis
(BGCS). BGCS is tailored for binary medical datasets and excels in generating
synthetic minority data that mirrors the distribution of the original data.
BGCS enhances early dialysis prediction by outperforming traditional methods in
detecting dialysis patients. For the best ML model, Random Forest, BCGS
achieved a 72% improvement, surpassing the state-of-the-art augmentation
approaches. Also, we present a ML-based CDSS, designed to aid clinicians in
making informed decisions. CDSS, which utilizes decision tree models, is
developed to improve patient outcomes, identify critical variables, and thereby
enable clinicians to make proactive decisions, and strategize treatment plans
effectively for CKD patients who are more likely to require dialysis in the
near future. Through comprehensive feature analysis and meticulous data
preparation, we ensure that the CDSS's dialysis predictions are not only
accurate but also actionable, providing a valuable tool in the management and
treatment of CKD.
Related papers
- Enhancing End Stage Renal Disease Outcome Prediction: A Multi-Sourced Data-Driven Approach [7.212939068975618]
We utilized data about 10,326 CKD patients, combining their clinical and claims information from 2009 to 2018.
A 24-month observation window was identified as optimal for balancing early detection and prediction accuracy.
The 2021 eGFR equation improved prediction accuracy and reduced racial bias, notably for African American patients.
arXiv Detail & Related papers (2024-10-02T03:21:01Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Enhancing Mortality Prediction in Heart Failure Patients: Exploring
Preprocessing Methods for Imbalanced Clinical Datasets [0.0]
Heart failure (HF) is a critical condition in which the accurate prediction of mortality plays a vital role in guiding patient management decisions.
We present a comprehensive preprocessing framework including scaling, outliers processing and resampling.
By leveraging appropriate preprocessing techniques and Machine Learning (ML) algorithms, we aim to improve mortality prediction performance for HF patients.
arXiv Detail & Related papers (2023-09-30T18:31:15Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Literature-Augmented Clinical Outcome Prediction [10.46990394710927]
We introduce techniques to help bridge this gap between EBM and AI-based clinical models.
We propose a novel system that automatically retrieves patient-specific literature based on intensive care (ICU) patient information.
Our model is able to substantially boost predictive accuracy on three challenging tasks in comparison to strong recent baselines.
arXiv Detail & Related papers (2021-11-16T11:19:02Z) - Improvement of a Prediction Model for Heart Failure Survival through
Explainable Artificial Intelligence [0.0]
This work presents an explainability analysis and evaluation of a prediction model for heart failure survival.
The model employs a data workflow pipeline able to select the best ensemble tree algorithm as well as the best feature selection technique.
The paper's main contribution is an explainability-driven approach to select the best prediction model for HF survival based on an accuracy-explainability balance.
arXiv Detail & Related papers (2021-08-20T09:03:26Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Longitudinal modeling of MS patient trajectories improves predictions of
disability progression [2.117653457384462]
This work addresses the task of optimally extracting information from longitudinal patient data in the real-world setting.
We show that with machine learning methods suited for patient trajectories modeling, we can predict disability progression of patients in a two-year horizon.
Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.
arXiv Detail & Related papers (2020-11-09T20:48:00Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.