Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI
- URL: http://arxiv.org/abs/2404.04686v1
- Date: Sat, 6 Apr 2024 17:23:21 GMT
- Title: Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI
- Authors: Taminul Islam, Md. Alif Sheakh, Mst. Sazia Tahosin, Most. Hasna Hena, Shopnil Akash, Yousef A. Bin Jardan, Gezahign Fentahun Wondmie, Hiba-Allah Nafidi, Mohammed Bourhia,
- Abstract summary: We evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods.
XGBoost achieved the best model accuracy, which is 97%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.
Related papers
- A data balancing approach towards design of an expert system for Heart Disease Prediction [0.9895793818721335]
Heart disease is a serious global health issue that claims millions of lives every year.
We employed five machine learning methods in this paper: Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis, Extra TreeBoost, and AdaBoost.
The accuracy of the Random Forest and Decision Tree model was 99.83%.
arXiv Detail & Related papers (2024-07-26T08:56:13Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Machine Learning-Assisted Recurrence Prediction for Early-Stage
Non-Small-Cell Lung Cancer Patients [10.127130900852405]
Stratifying cancer patients according to risk of relapse can personalize their care.
In this work, we provide an answer to how to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients.
arXiv Detail & Related papers (2022-11-17T19:34:16Z) - Machine Learning Approaches to Predict Breast Cancer: Bangladesh
Perspective [0.0]
This study focuses on finding the best algorithm that can forecast breast cancer with maximum accuracy in terms of its classes.
After implementing the model, this study achieved the best model accuracy, 94% on Random Forest and XGBoost.
arXiv Detail & Related papers (2022-06-30T01:44:53Z) - PCA-RF: An Efficient Parkinson's Disease Prediction Model based on
Random Forest Classification [3.6704226968275258]
In this paper, a disease prediction approach is proposed that implements a random forest classifier on Parkinson's disease.
We compare the accuracy of this model with the Principal Component Analysis (PCA) applied Artificial Neural Network (ANN) model and captured a visible difference.
The model secured a significant accuracy of up to 90%.
arXiv Detail & Related papers (2022-03-21T18:59:08Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - A Machine Learning Challenge for Prognostic Modelling in Head and Neck
Cancer Using Multi-modal Data [0.10651507097431492]
We have conducted an institutional machine learning challenge to develop an accurate model for overall survival prediction in head and neck cancer.
We compared 12 different submissions using imaging and clinical data, separately or in combination.
The winning approach used non-linear, multitask learning on clinical data and tumour volume, achieving high prognostic accuracy for 2-year and lifetime survival prediction.
arXiv Detail & Related papers (2021-01-28T11:20:34Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.