Deep Modeling and Optimization of Medical Image Classification
- URL: http://arxiv.org/abs/2505.23040v1
- Date: Thu, 29 May 2025 03:27:51 GMT
- Title: Deep Modeling and Optimization of Medical Image Classification
- Authors: Yihang Wu, Muhammad Owais, Reem Kateb, Ahmad Chaddad,
- Abstract summary: We introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer.<n>We involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data.
- Score: 5.195343321287341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03\%) in HAM10000 dataset with multimodal learning, while convnext\_l demonstrates remarkable test with an F1-score of 83.98\% compared to swin\_b with 81.33\% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of $\sim 2\%$ for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.
Related papers
- An empirical study for the early detection of Mpox from skin lesion images using pretrained CNN models leveraging XAI technique [0.471858286267785]
Mpox is a zoonotic disease caused by the Mpox virus, which shares similarities with other skin conditions.<n>This study aims to evaluate the effectiveness of pre-trained CNN models for the early detection of monkeypox.<n>It also seeks to enhance model interpretability using Grad-CAM an XAI technique.
arXiv Detail & Related papers (2025-07-21T17:30:08Z) - Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification [7.405837346783951]
Rising breast cancer (BC) occurrence and mortality are major global concerns for women.
Deep learning (DL) has demonstrated superior diagnostic performance in BC classification compared to human expert readers.
This study proposes a multimodal DL architecture for BC classification, utilising images (mammograms; four views) and textual data (radiological reports) from our new in-house dataset.
arXiv Detail & Related papers (2024-10-14T04:22:24Z) - Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.<n>This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation [15.514511820130474]
We develop a 3D patch-based hybrid CNN-Mamba model for subcortical brain segmentation.
Our model's performance was validated against several benchmarks.
arXiv Detail & Related papers (2024-09-12T02:19:19Z) - Brain Tumor Radiogenomic Classification [1.8276368987462532]
The RSNA-MICCAI brain tumor radiogenomic classification challenge aimed to predict MGMT biomarker status in glioblastoma through binary classification.
The dataset is splitted into three main cohorts: training set, validation set which were used during training, and the testing were only used during final evaluation.
arXiv Detail & Related papers (2024-01-11T10:30:09Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.