Cardiac mortality prediction in patients undergoing PCI based on real and synthetic data
- URL: http://arxiv.org/abs/2512.22259v1
- Date: Wed, 24 Dec 2025 10:12:08 GMT
- Title: Cardiac mortality prediction in patients undergoing PCI based on real and synthetic data
- Authors: Daniil Burakov, Ivan Petrov, Dmitrii Khelimskii, Ivan Bessonov, Mikhail Lazarev,
- Abstract summary: The aim of the study was to develop a predictive model for assessing the risk of cardiac death based on the real and synthetic data of patients undergoing PCI.<n>Several machine learning models were applied to predict three-year mortality after PCI.<n>The results show that straightforward augmentation with realistic and extreme cases can expose, quantify, and reduce brittleness in imbalanced clinical prediction.
- Score: 0.13048920509133807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patient status, angiographic and procedural characteristics encode crucial signals for predicting long-term outcomes after percutaneous coronary intervention (PCI). The aim of the study was to develop a predictive model for assessing the risk of cardiac death based on the real and synthetic data of patients undergoing PCI and to identify the factors that have the greatest impact on mortality. We analyzed 2,044 patients, who underwent a PCI for bifurcation lesions. The primary outcome was cardiac death at 3-year follow-up. Several machine learning models were applied to predict three-year mortality after PCI. To address class imbalance and improve the representation of the minority class, an additional 500 synthetic samples were generated and added to the training set. To evaluate the contribution of individual features to model performance, we applied permutation feature importance. An additional experiment was conducted to evaluate how the model's predictions would change after removing non-informative features from the training and test datasets. Without oversampling, all models achieve high overall accuracy (0.92-0.93), yet they almost completely ignore the minority class. Across models, augmentation consistently increases minority-class recall with minimal loss of AUROC, improves probability quality, and yields more clinically reasonable risk estimates on the constructed severe profiles. According to feature importance analysis, four features emerged as the most influential: Age, Ejection Fraction, Peripheral Artery Disease, and Cerebrovascular Disease. These results show that straightforward augmentation with realistic and extreme cases can expose, quantify, and reduce brittleness in imbalanced clinical prediction using only tabular records, and motivate routine reporting of probability quality and stress tests alongside headline metrics.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Feature-Enhanced Machine Learning for All-Cause Mortality Prediction in Healthcare Data [0.0]
This study evaluates machine learning models for all-cause in-hospital mortality prediction using the MIMIC-III database.<n>We extracted key features such as vital signs (e.g., heart rate, blood pressure), laboratory results and demographic information.<n>The Random Forest model achieved the highest performance with an AUC of 0.94, significantly outperforming other machine learning and deep learning approaches.
arXiv Detail & Related papers (2025-03-27T08:04:42Z) - Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database [0.9055332067000195]
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality.<n> predictive models for lung metastasis inHCC remain limited in scope and clinical applicability.<n>We develop and validate an end-to-end machine learning pipeline using data from the Surveillance, Epidemiology, and End Results (SEER) database.
arXiv Detail & Related papers (2025-01-20T20:06:31Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Improving Diffusion Models for ECG Imputation with an Augmented Template
Prior [43.6099225257178]
noisy and poor-quality recordings are a major issue for signals collected using mobile health systems.
Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models.
We present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
arXiv Detail & Related papers (2023-10-24T11:34:15Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Penalized Deep Partially Linear Cox Models with Application to CT Scans
of Lung Cancer Patients [42.09584755334577]
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective therapies.
The National Lung Screening Trial (NLST) employed computed tomography texture analysis to quantify the mortality risks of lung cancer patients.
We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model.
arXiv Detail & Related papers (2023-03-09T15:38:16Z) - Mortality Prediction with Adaptive Feature Importance Recalibration for
Peritoneal Dialysis Patients: a deep-learning-based study on a real-world
longitudinal follow-up dataset [19.7915762858399]
Peritoneal Dialysis (PD) is one of the most widely used life-supporting therapies for patients with End-Stage Renal Disease (ESRD)
Here, our objective is to develop a deep learning model for a real-time, individualized, and interpretable mortality prediction model - AICare.
This study has collected 13,091 clinical follow-up visits and demographic data of 656 PD patients.
arXiv Detail & Related papers (2023-01-17T13:17:54Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Predicting Post-Concussion Syndrome Outcomes with Machine Learning [0.0]
The results of this study demonstrate that machine learning models can predict PCS outcomes with high accuracy.
With further research, machine learning models may be implemented in healthcare settings to help patients with persistent PCS.
arXiv Detail & Related papers (2021-08-04T09:04:13Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - A Knowledge Distillation Ensemble Framework for Predicting Short and
Long-term Hospitalisation Outcomes from Electronic Health Records Data [5.844828229178025]
Existing outcome prediction models suffer from a low recall of infrequent positive outcomes.
We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission.
arXiv Detail & Related papers (2020-11-18T15:56:28Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.