Detecting Chronic Kidney Disease(CKD) at the Initial Stage: A Novel
Hybrid Feature-selection Method and Robust Data Preparation Pipeline for
Different ML Techniques
- URL: http://arxiv.org/abs/2203.01394v1
- Date: Wed, 2 Mar 2022 20:38:49 GMT
- Title: Detecting Chronic Kidney Disease(CKD) at the Initial Stage: A Novel
Hybrid Feature-selection Method and Robust Data Preparation Pipeline for
Different ML Techniques
- Authors: Md. Taufiqul Haque Khan Tusar, Md. Touhidul Islam, Foyjul Islam Raju
- Abstract summary: Chronic Kidney Disease (CKD) has infected almost 800 million people around the world. Around 1.7 million people die each year because of it.
Many researchers have applied distinct Machine Learning (ML) methods to detect CKD at an early stage, but detailed studies are still missing.
We present a structured and thorough method for dealing with the complexities of medical data with optimal performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chronic Kidney Disease (CKD) has infected almost 800 million people around
the world. Around 1.7 million people die each year because of it. Detecting CKD
in the initial stage is essential for saving millions of lives. Many
researchers have applied distinct Machine Learning (ML) methods to detect CKD
at an early stage, but detailed studies are still missing. We present a
structured and thorough method for dealing with the complexities of medical
data with optimal performance. Besides, this study will assist researchers in
producing clear ideas on the medical data preparation pipeline. In this paper,
we applied KNN Imputation to impute missing values, Local Outlier Factor to
remove outliers, SMOTE to handle data imbalance, K-stratified K-fold
Cross-validation to validate the ML models, and a novel hybrid feature
selection method to remove redundant features. Applied algorithms in this study
are Support Vector Machine, Gaussian Naive Bayes, Decision Tree, Random Forest,
Logistic Regression, K-Nearest Neighbor, Gradient Boosting, Adaptive Boosting,
and Extreme Gradient Boosting. Finally, the Random Forest can detect CKD with
100% accuracy without any data leakage.
Related papers
- Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis [6.796017024594715]
We suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA)
This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before.
arXiv Detail & Related papers (2024-07-19T19:07:53Z) - Survival Prediction from Imbalance colorectal cancer dataset using
hybrid sampling methods and tree-based classifiers [0.0]
This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients.
We propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate.
Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.
arXiv Detail & Related papers (2023-09-04T19:48:56Z) - An Improved Heart Disease Prediction Using Stacked Ensemble Method [0.9187159782788579]
We constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset.
Our method can easily differentiate between people who have cardiac disease and those who are normal.
arXiv Detail & Related papers (2023-04-12T17:53:59Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Building Brains: Subvolume Recombination for Data Augmentation in Large
Vessel Occlusion Detection [56.67577446132946]
A large training data set is required for a standard deep learning-based model to learn this strategy from data.
We propose an augmentation method that generates artificial training samples by recombining vessel tree segmentations of the hemispheres from different patients.
In line with the augmentation scheme, we use a 3D-DenseNet fed with task-specific input, fostering a side-by-side comparison between the hemispheres.
arXiv Detail & Related papers (2022-05-05T10:31:57Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - Survival Prediction of Children Undergoing Hematopoietic Stem Cell
Transplantation Using Different Machine Learning Classifiers by Performing
Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis [4.067706269490143]
An efficient survival classification model is presented in a comprehensive manner.
A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection.
Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting, Ada Boost, and XG Boost.
arXiv Detail & Related papers (2022-01-22T08:01:22Z) - An Explainable Classification Model for Chronic Kidney Disease Patients [0.0]
Chronic Kidney Disease (CKD) is experiencing a globally increasing incidence and high cost to health systems.
The employment of data mining to discover subtle patterns in CKD indicators would contribute to an early diagnosis.
This work develops a classifier model that would support healthcare professionals in the early diagnosis of CKD patients.
arXiv Detail & Related papers (2021-05-21T14:09:43Z) - A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example [50.591267188664666]
The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate.
The proposed random shuffle method was able to enhance the HF dataset cardinality circa 10 times and circa 21 times when followed by a random repeated-measures approach.
arXiv Detail & Related papers (2020-12-12T10:59:38Z) - An Uncertainty-Driven GCN Refinement Strategy for Organ Segmentation [53.425900196763756]
We propose a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
We employ the uncertainty levels of the convolutional network in a particular input volume to formulate a semi-supervised graph learning problem.
We show that our method outperforms the state-of-the-art CRF refinement method by improving the dice score by 1% for the pancreas and 2% for spleen.
arXiv Detail & Related papers (2020-12-06T18:55:07Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.