Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays
- URL: http://arxiv.org/abs/2507.10589v1
- Date: Fri, 11 Jul 2025 16:26:24 GMT
- Title: Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays
- Authors: Gaurav Singh,
- Abstract summary: Pneumonia, particularly when induced by diseases like COVID-19, remains a critical global health challenge requiring rapid and accurate diagnosis.<n>This study presents a comprehensive comparison of traditional machine learning and state-of-the-art deep learning approaches for automated pneumonia detection using chest X-rays.<n>We demonstrate that Vision Transformers, particularly the Cross-ViT architecture, achieve superior performance with 88.25% accuracy and 99.42% recall.
- Score: 1.2310316230437004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pneumonia, particularly when induced by diseases like COVID-19, remains a critical global health challenge requiring rapid and accurate diagnosis. This study presents a comprehensive comparison of traditional machine learning and state-of-the-art deep learning approaches for automated pneumonia detection using chest X-rays (CXRs). We evaluate multiple methodologies, ranging from conventional machine learning techniques (PCA-based clustering, Logistic Regression, and Support Vector Classification) to advanced deep learning architectures including Convolutional Neural Networks (Modified LeNet, DenseNet-121) and various Vision Transformer (ViT) implementations (Deep-ViT, Compact Convolutional Transformer, and Cross-ViT). Using a dataset of 5,856 pediatric CXR images, we demonstrate that Vision Transformers, particularly the Cross-ViT architecture, achieve superior performance with 88.25% accuracy and 99.42% recall, surpassing traditional CNN approaches. Our analysis reveals that architectural choices impact performance more significantly than model size, with Cross-ViT's 75M parameters outperforming larger models. The study also addresses practical considerations including computational efficiency, training requirements, and the critical balance between precision and recall in medical diagnostics. Our findings suggest that Vision Transformers offer a promising direction for automated pneumonia detection, potentially enabling more rapid and accurate diagnosis during health crises.
Related papers
- An Integrated Deep Learning Framework Leveraging NASNet and Vision Transformer with MixProcessing for Accurate and Precise Diagnosis of Lung Diseases [0.12277343096128711]
The NASNet-ViT model performs at state of the art, achieving an accuracy of 98.9%, sensitivity of 0.99, an F1-score of 0.989, and specificity of 0.987.<n>These results reflect the high-quality capability of NASNet-ViT in extracting meaningful features and recognizing various types of lung diseases with very high accuracy.
arXiv Detail & Related papers (2025-02-27T22:17:38Z) - Lung Disease Detection with Vision Transformers: A Comparative Study of Machine Learning Methods [0.0]
This study explores the application of Vision Transformers (ViT), a state-of-the-art architecture in machine learning, to chest X-ray analysis.
I present a comparative analysis of two ViT-based approaches: one utilizing full chest X-ray images and another focusing on segmented lung regions.
arXiv Detail & Related papers (2024-11-18T08:40:25Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - A novel method to enhance pneumonia detection via a model-level
ensembling of CNN and vision transformer [0.7499722271664147]
Pneumonia remains a leading cause of morbidity and mortality worldwide.
Deep learning has shown immense potential for pneumonia detection from Chest X-ray (CXR) imaging.
We developed a novel model fusing Convolution Neural networks (CNN) and Vision Transformer networks via model-level ensembling.
arXiv Detail & Related papers (2024-01-04T16:58:31Z) - Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video
Sequences Using Swin Transformer-Enhanced UNet [2.7547288571938795]
We present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers.
Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application.
arXiv Detail & Related papers (2023-10-05T07:48:55Z) - Vision Transformer-based Model for Severity Quantification of Lung
Pneumonia Using Chest X-ray Images [11.12596879975844]
We present a Vision Transformer-based neural network model that relies on a small number of trainable parameters to quantify the severity of COVID-19 and other lung diseases.
Our model can provide peak performance in quantifying severity with high generalizability at a relatively low computational cost.
arXiv Detail & Related papers (2023-03-18T12:38:23Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - In-Line Image Transformations for Imbalanced, Multiclass Computer Vision
Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data.
Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states.
This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent
Multi-View Representation Learning [48.05232274463484]
Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world.
Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed.
In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images.
arXiv Detail & Related papers (2020-05-06T15:19:15Z) - Residual Attention U-Net for Automated Multi-Class Segmentation of
COVID-19 Chest CT Images [46.844349956057776]
coronavirus disease 2019 (COVID-19) has been spreading rapidly around the world and caused significant impact on the public health and economy.
There is still lack of studies on effectively quantifying the lung infection caused by COVID-19.
We propose a novel deep learning algorithm for automated segmentation of multiple COVID-19 infection regions.
arXiv Detail & Related papers (2020-04-12T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.