Survival Prediction of Children Undergoing Hematopoietic Stem Cell
Transplantation Using Different Machine Learning Classifiers by Performing
Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis
- URL: http://arxiv.org/abs/2201.08987v1
- Date: Sat, 22 Jan 2022 08:01:22 GMT
- Title: Survival Prediction of Children Undergoing Hematopoietic Stem Cell
Transplantation Using Different Machine Learning Classifiers by Performing
Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis
- Authors: Ishrak Jahan Ratul, Ummay Habiba Wani, Mirza Muntasir Nishat, Abdullah
Al-Monsur, Abrar Mohammad Ar-Rafi, Fahim Faisal, and Mohammad Ridwan Kabir
- Abstract summary: An efficient survival classification model is presented in a comprehensive manner.
A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection.
Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting, Ada Boost, and XG Boost.
- Score: 4.067706269490143
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Bone Marrow Transplant, a gradational rescue for a wide range of disorders
emanating from the bone marrow, is an efficacious surgical treatment. Several
risk factors, such as post-transplant illnesses, new malignancies, and even
organ damage, can impair long-term survival. Therefore, technologies like
Machine Learning are deployed for investigating the survival prediction of BMT
receivers along with the influences that limit their resilience. In this study,
an efficient survival classification model is presented in a comprehensive
manner, incorporating the Chi-squared feature selection method to address the
dimensionality problem and Hyper Parameter Optimization (HPO) to increase
accuracy. A synthetic dataset is generated by imputing the missing values,
transforming the data using dummy variable encoding, and compressing the
dataset from 59 features to the 11 most correlated features using Chi-squared
feature selection. The dataset was split into train and test sets at a ratio of
80:20, and the hyperparameters were optimized using Grid Search
Cross-Validation. Several supervised ML methods were trained in this regard,
like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors,
Gradient Boosting Classifier, Ada Boost, and XG Boost. The simulations have
been performed for both the default and optimized hyperparameters by using the
original and reduced synthetic dataset. After ranking the features using the
Chi-squared test, it was observed that the top 11 features with HPO, resulted
in the same accuracy of prediction (94.73%) as the entire dataset with default
parameters. Moreover, this approach requires less time and resources for
predicting the survivability of children undergoing BMT. Hence, the proposed
approach may aid in the development of a computer-aided diagnostic system with
satisfactory accuracy and minimal computation time by utilizing medical data
records.
Related papers
- Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - A Bio-Medical Snake Optimizer System Driven by Logarithmic Surviving Global Search for Optimizing Feature Selection and its application for Disorder Recognition [1.3755153408022656]
It is paramount to enhance medical practices, given how important it is to protect human life.
Medical therapy can be accelerated by automating patient prediction using machine learning techniques.
Several preprocessing strategies must be adopted for their crucial duty in this field.
arXiv Detail & Related papers (2024-02-22T09:08:18Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Detecting Chronic Kidney Disease(CKD) at the Initial Stage: A Novel
Hybrid Feature-selection Method and Robust Data Preparation Pipeline for
Different ML Techniques [0.0]
Chronic Kidney Disease (CKD) has infected almost 800 million people around the world. Around 1.7 million people die each year because of it.
Many researchers have applied distinct Machine Learning (ML) methods to detect CKD at an early stage, but detailed studies are still missing.
We present a structured and thorough method for dealing with the complexities of medical data with optimal performance.
arXiv Detail & Related papers (2022-03-02T20:38:49Z) - Cervical Cytology Classification Using PCA & GWO Enhanced Deep Features
Selection [1.990876596716716]
Cervical cancer is one of the most deadly and common diseases among women worldwide.
We propose a fully automated framework that utilizes Deep Learning and feature selection.
The framework is evaluated on three publicly available benchmark datasets.
arXiv Detail & Related papers (2021-06-09T08:57:22Z) - Resource Planning for Hospitals Under Special Consideration of the
COVID-19 Pandemic: Optimization and Sensitivity Analysis [87.31348761201716]
Crises like the COVID-19 pandemic pose a serious challenge to health-care institutions.
BaBSim.Hospital is a tool for capacity planning based on discrete event simulation.
We aim to investigate and optimize these parameters to improve BaBSim.Hospital.
arXiv Detail & Related papers (2021-05-16T12:38:35Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.