Enhancing Generalization in Sickle Cell Disease Diagnosis through Ensemble Methods and Feature Importance Analysis
- URL: http://arxiv.org/abs/2601.13021v1
- Date: Mon, 19 Jan 2026 13:00:41 GMT
- Title: Enhancing Generalization in Sickle Cell Disease Diagnosis through Ensemble Methods and Feature Importance Analysis
- Authors: Nataša Petrović, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Jose Maria Buades Rubio,
- Abstract summary: This work presents a novel approach for selecting the optimal ensemble-based classification method and features.<n>It provides diagnostic support for Sickle Cell Disease using peripheral blood smear images of red blood cells.
- Score: 2.8601718604194706
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work presents a novel approach for selecting the optimal ensemble-based classification method and features with a primarly focus on achieving generalization, based on the state-of-the-art, to provide diagnostic support for Sickle Cell Disease using peripheral blood smear images of red blood cells. We pre-processed and segmented the microscopic images to ensure the extraction of high-quality features. To ensure the reliability of our proposed system, we conducted an in-depth analysis of interpretability. Leveraging techniques established in the literature, we extracted features from blood cells and employed ensemble machine learning methods to classify their morphology. Furthermore, we have devised a methodology to identify the most critical features for classification, aimed at reducing complexity and training time and enhancing interpretability in opaque models. Lastly, we validated our results using a new dataset, where our model overperformed state-of-the-art models in terms of generalization. The results of classifier ensembled of Random Forest and Extra Trees classifier achieved an harmonic mean of precision and recall (F1-score) of 90.71\% and a Sickle Cell Disease diagnosis support score (SDS-score) of 93.33\%. These results demonstrate notable enhancement from previous ones with Gradient Boosting classifier (F1-score 87.32\% and SDS-score 89.51\%). To foster scientific progress, we have made available the parameters for each model, the implemented code library, and the confusion matrices with the raw data.
Related papers
- DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis [43.565183518761984]
We propose DOGMA, a data-centric framework designed for the structural reshaping and semantic enhancement of raw data.<n>In complex multi-species and multi-organ benchmarks, DOGMA SOTA performance, exhibiting superior zero-shot robustness and sample efficiency.
arXiv Detail & Related papers (2026-02-02T09:10:09Z) - Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Foundation Models in Dermatopathology: Skin Tissue Classification [0.05397680436511065]
This study evaluates the performance of two foundation models, UNI and Virchow2, as feature extractors for classifying whole-slide images.<n> Patch-level embeddings were aggregated into slide-level features using a mean-aggregation strategy.<n>Results demonstrate that patch-level features extracted using Virchow2 outperformed those extracted via UNI across most slide-level classifiers.
arXiv Detail & Related papers (2025-10-24T17:21:43Z) - Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion [0.0]
We explore Sequential and Parallel Hybrid CNN-Transformer models with Convolutional Kolmogorov-Arnold Network (CKAN)<n>Our approach integrates transfer learning and extensive data augmentation, where CNNs extract local spatial features, Transformers model global dependencies, and CKAN facilitates nonlinear feature fusion for improved representation learning.<n>Our proposed approach achieves competitive performance in skin cancer classification, demonstrating 92.81% accuracy and 92.47% F1-score on the HAM10000 dataset, 97.83% accuracy and 97.83% F1-score on the PAD-UFES dataset, and 91.17% accuracy with 91.79% F1- score on
arXiv Detail & Related papers (2025-08-17T19:57:34Z) - Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models [40.325359811289445]
This paper uses Machine Learning models combined with pretrained Deep Learning models to extract deep feature representations without the need for augmented data.<n>The findings show that this feature extraction method, when paired with other methods in the state-of-the-art, produces excellent classification outcomes.
arXiv Detail & Related papers (2025-06-06T11:52:07Z) - Enhanced Tuberculosis Bacilli Detection using Attention-Residual U-Net and Ensemble Classification [0.0]
Tuberculosis, caused by Mycobacterium tuberculosis, remains a critical global health issue, necessitating timely diagnosis and treatment.<n>Current methods for detecting tuberculosis bacilli from bright field microscopic sputum smear images suffer from low automation, inadequate segmentation performance, and limited classification accuracy.<n>This paper proposes an efficient hybrid approach that combines deep learning for segmentation and an ensemble model for classification.
arXiv Detail & Related papers (2025-01-07T05:21:13Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [39.146761527401424]
Single-cell RNA sequencing allows the quantification of gene expression at the individual cell level.<n> Dimensionality reduction is a common preprocessing step critical for the visualization, clustering, and phenotypic characterization of samples.<n>We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.<n>We show that our method scales seamlessly to millions of cells, enabling dimensionality reduction in large single-cell datasets.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images [40.347953893940044]
We introduce a novel approach for white blood cell classification based on neural cellular automata (NCA)
Our NCA-based method is significantly smaller in terms of parameters and exhibits robustness to domain shifts.
Our results demonstrate that NCA can be used for image classification, and they address key challenges of conventional methods.
arXiv Detail & Related papers (2024-04-08T14:59:53Z) - Classification of lung cancer subtypes on CT images with synthetic
pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images.
We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Ensemble of CNN classifiers using Sugeno Fuzzy Integral Technique for
Cervical Cytology Image Classification [1.6986898305640261]
We propose a fully automated computer-aided diagnosis tool for classifying single-cell and slide images of cervical cancer.
We use the Sugeno Fuzzy Integral to ensemble the decision scores from three popular deep learning models, namely, Inception v3, DenseNet-161 and ResNet-34.
arXiv Detail & Related papers (2021-08-21T08:41:41Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Sickle-cell disease diagnosis support selecting the most appropriate
machinelearning method: Towards a general and interpretable approach for
cellmorphology analysis from microscopy images [0.0]
We propose an approach to select the classification method and features, based on the state-of-the-art.
We used samples of patients with sickle-cell disease which can be generalized for other study cases.
arXiv Detail & Related papers (2020-10-09T11:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.