Survival Prediction from Imbalance colorectal cancer dataset using
hybrid sampling methods and tree-based classifiers
- URL: http://arxiv.org/abs/2309.01783v1
- Date: Mon, 4 Sep 2023 19:48:56 GMT
- Title: Survival Prediction from Imbalance colorectal cancer dataset using
hybrid sampling methods and tree-based classifiers
- Authors: Sadegh Soleimani, Mahsa Bahrami, Mansour Vali
- Abstract summary: This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients.
We propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate.
Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Background and Objective: Colorectal cancer is a high mortality cancer.
Clinical data analysis plays a crucial role in predicting the survival of
colorectal cancer patients, enabling clinicians to make informed treatment
decisions. However, utilizing clinical data can be challenging, especially when
dealing with imbalanced outcomes. This paper focuses on developing algorithms
to predict 1-, 3-, and 5-year survival of colorectal cancer patients using
clinical datasets, with particular emphasis on the highly imbalanced 1-year
survival prediction task. To address this issue, we propose a method that
creates a pipeline of some of standard balancing techniques to increase the
true positive rate. Evaluation is conducted on a colorectal cancer dataset from
the SEER database. Methods: The pre-processing step consists of removing
records with missing values and merging categories. The minority class of
1-year and 3-year survival tasks consists of 10% and 20% of the data,
respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN),
Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and
RENN approaches were used and compared for balancing the data with tree-based
classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient
Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method.
Results: The performance evaluation utilizes a 5-fold cross-validation
approach. In the case of highly imbalanced datasets (1-year), our proposed
method with LGBM outperforms other sampling methods with the sensitivity of
72.30%. For the task of imbalance (3-year survival), the combination of RENN
and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method
works best for highly imbalanced datasets. Conclusions: Our proposed method
significantly improves mortality prediction for the minority class of
colorectal cancer patients.
Related papers
- Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging [71.91773485443125]
Grading plays a vital role in breast cancer treatment planning.
The current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs.
This paper examines using optimized CDI$s$ to improve breast cancer grade prediction.
arXiv Detail & Related papers (2024-05-13T15:48:26Z) - Kernel Cox partially linear regression: building predictive models for
cancer patients' survival [4.230753712933184]
We build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model.
We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors.
Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.
arXiv Detail & Related papers (2023-10-11T04:27:54Z) - Enhancing Mortality Prediction in Heart Failure Patients: Exploring
Preprocessing Methods for Imbalanced Clinical Datasets [0.0]
Heart failure (HF) is a critical condition in which the accurate prediction of mortality plays a vital role in guiding patient management decisions.
We present a comprehensive preprocessing framework including scaling, outliers processing and resampling.
By leveraging appropriate preprocessing techniques and Machine Learning (ML) algorithms, we aim to improve mortality prediction performance for HF patients.
arXiv Detail & Related papers (2023-09-30T18:31:15Z) - Development and external validation of a lung cancer risk estimation
tool using gradient-boosting [3.200615329024819]
Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates.
We propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST.
The developed ML tool provides a freely available web application for estimating the likelihood of developing lung cancer within five years.
arXiv Detail & Related papers (2023-08-23T15:25:17Z) - Pathology-and-genomics Multimodal Transformer for Survival Outcome
Prediction [43.1748594898772]
We propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction.
We emphasize the unsupervised pretraining to capture the intrinsic interaction between tissue microenvironments in gigapixel whole slide images.
We evaluate our approach on both TCGA colon and rectum cancer cohorts, showing that the proposed approach is competitive and outperforms state-of-the-art studies.
arXiv Detail & Related papers (2023-07-22T00:59:26Z) - Multimodal Deep Learning for Personalized Renal Cell Carcinoma
Prognosis: Integrating CT Imaging and Clinical Data [3.790959613880792]
Renal cell carcinoma represents a significant global health challenge with a low survival rate.
This research aimed to devise a comprehensive deep-learning model capable of predicting survival probabilities in patients with renal cell carcinoma.
The proposed framework comprises three modules: a 3D image feature extractor, clinical variable selection, and survival prediction.
arXiv Detail & Related papers (2023-07-07T13:09:07Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Gene selection from microarray expression data: A Multi-objective PSO
with adaptive K-nearest neighborhood [0.0]
This paper deals with the classification problem of human cancer diseases by using gene expression data.
It is presented a new methodology to analyze microarray datasets and efficiently classify cancer diseases.
arXiv Detail & Related papers (2022-05-27T04:22:10Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.