Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification
- URL: http://arxiv.org/abs/2507.21156v1
- Date: Thu, 24 Jul 2025 19:40:13 GMT
- Title: Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification
- Authors: Kunal Kawadkar,
- Abstract summary: Vision Transformers (ViTs) have revolutionized computer vision, yet their effectiveness compared to traditional Convolutional Neural Networks (CNNs) in medical imaging remains under-explored.<n>This study presents a comparative analysis of CNN and ViT architectures across three critical medical imaging tasks: chest X-ray pneumonia detection, brain tumor classification, and skin cancer melanoma detection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of Vision Transformers (ViTs) has revolutionized computer vision, yet their effectiveness compared to traditional Convolutional Neural Networks (CNNs) in medical imaging remains under-explored. This study presents a comprehensive comparative analysis of CNN and ViT architectures across three critical medical imaging tasks: chest X-ray pneumonia detection, brain tumor classification, and skin cancer melanoma detection. We evaluated four state-of-the-art models - ResNet-50, EfficientNet-B0, ViT-Base, and DeiT-Small - across datasets totaling 8,469 medical images. Our results demonstrate task-specific model advantages: ResNet-50 achieved 98.37% accuracy on chest X-ray classification, DeiT-Small excelled at brain tumor detection with 92.16% accuracy, and EfficientNet-B0 led skin cancer classification at 81.84% accuracy. These findings provide crucial insights for practitioners selecting architectures for medical AI applications, highlighting the importance of task-specific architecture selection in clinical decision support systems.
Related papers
- Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays [1.2310316230437004]
Pneumonia, particularly when induced by diseases like COVID-19, remains a critical global health challenge requiring rapid and accurate diagnosis.<n>This study presents a comprehensive comparison of traditional machine learning and state-of-the-art deep learning approaches for automated pneumonia detection using chest X-rays.<n>We demonstrate that Vision Transformers, particularly the Cross-ViT architecture, achieve superior performance with 88.25% accuracy and 99.42% recall.
arXiv Detail & Related papers (2025-07-11T16:26:24Z) - Lung Disease Detection with Vision Transformers: A Comparative Study of Machine Learning Methods [0.0]
This study explores the application of Vision Transformers (ViT), a state-of-the-art architecture in machine learning, to chest X-ray analysis.
I present a comparative analysis of two ViT-based approaches: one utilizing full chest X-ray images and another focusing on segmented lung regions.
arXiv Detail & Related papers (2024-11-18T08:40:25Z) - Advanced Hybrid Deep Learning Model for Enhanced Classification of Osteosarcoma Histopathology Images [0.0]
This study focuses on osteosarcoma (OS), the most common bone cancer in children and adolescents, which affects the long bones of the arms and legs.
We propose a novel hybrid model that combines convolutional neural networks (CNN) and vision transformers (ViT) to improve diagnostic accuracy for OS.
The model achieved an accuracy of 99.08%, precision of 99.10%, recall of 99.28%, and an F1-score of 99.23%.
arXiv Detail & Related papers (2024-10-29T13:54:08Z) - Evaluating Deep Learning Models for Breast Cancer Classification: A Comparative Study [9.392940888377423]
The Vision Transformer (ViT) model, with its attention-based mechanisms, achieved the highest validation accuracy of 94%.<n>The study demonstrates the potential of advanced machine learning methods to enhance precision and efficiency in breast cancer diagnosis in clinical settings.
arXiv Detail & Related papers (2024-08-29T18:49:32Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient
Network [37.931408083443074]
Pixel-Lesion-pAtient Network (PLAN) is proposed to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss.
PLAN achieves 95% and 96% in patient-level sensitivity and specificity.
On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation.
arXiv Detail & Related papers (2023-07-17T06:21:45Z) - Brain Tumor MRI Classification using a Novel Deep Residual and Regional
CNN [0.0]
A novel deep residual and regional-based Res-BRNet Convolutional Neural Network (CNN) is developed for effective brain tumor (Magnetic Resonance Imaging) MRI classification.
The efficiency of the developed Res-BRNet is evaluated on a standard dataset; collected from Kaggle and Figshare containing various tumor categories.
Experiments prove that the developed Res-BRNet outperforms the standard CNN models and attained excellent performances.
arXiv Detail & Related papers (2022-11-29T20:14:13Z) - EMT-NET: Efficient multitask network for computer-aided diagnosis of
breast cancer [58.720142291102135]
We propose an efficient and light-weighted learning architecture to classify and segment breast tumors simultaneously.
We incorporate a segmentation task into a tumor classification network, which makes the backbone network learn representations focused on tumor regions.
The accuracy, sensitivity, and specificity of tumor classification is 88.6%, 94.1%, and 85.3%, respectively.
arXiv Detail & Related papers (2022-01-13T05:24:40Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - COVID-Net CT-2: Enhanced Deep Neural Networks for Detection of COVID-19
from Chest CT Images Through Bigger, More Diverse Learning [70.92379567261304]
We introduce COVID-Net CT-2, enhanced deep neural networks for COVID-19 detection from chest CT images.
We leverage explainability to investigate the decision-making behaviour of COVID-Net CT-2.
Results are promising and suggest the strong potential of deep neural networks as an effective tool for computer-aided COVID-19 assessment.
arXiv Detail & Related papers (2021-01-19T03:04:09Z) - Residual Attention U-Net for Automated Multi-Class Segmentation of
COVID-19 Chest CT Images [46.844349956057776]
coronavirus disease 2019 (COVID-19) has been spreading rapidly around the world and caused significant impact on the public health and economy.
There is still lack of studies on effectively quantifying the lung infection caused by COVID-19.
We propose a novel deep learning algorithm for automated segmentation of multiple COVID-19 infection regions.
arXiv Detail & Related papers (2020-04-12T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.