Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image
Classification
- URL: http://arxiv.org/abs/2304.11529v1
- Date: Sun, 23 Apr 2023 04:07:03 GMT
- Title: Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image
Classification
- Authors: Smriti Regmi, Aliza Subedi, Ulas Bagci, Debesh Jha
- Abstract summary: This study uses different CNNs and transformer-based methods with a wide range of data augmentation techniques.
We evaluated their performance on three medical image datasets from different modalities.
- Score: 2.3293678240472517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image analysis is a hot research topic because of its usefulness in
different clinical applications, such as early disease diagnosis and treatment.
Convolutional neural networks (CNNs) have become the de-facto standard in
medical image analysis tasks because of their ability to learn complex features
from the available datasets, which makes them surpass humans in many
image-understanding tasks. In addition to CNNs, transformer architectures also
have gained popularity for medical image analysis tasks. However, despite
progress in the field, there are still potential areas for improvement. This
study uses different CNNs and transformer-based methods with a wide range of
data augmentation techniques. We evaluated their performance on three medical
image datasets from different modalities. We evaluated and compared the
performance of the vision transformer model with other state-of-the-art (SOTA)
pre-trained CNN networks. For Chest X-ray, our vision transformer model
achieved the highest F1 score of 0.9532, recall of 0.9533, Matthews correlation
coefficient (MCC) of 0.9259, and ROC-AUC score of 0.97. Similarly, for the
Kvasir dataset, we achieved an F1 score of 0.9436, recall of 0.9437, MCC of
0.9360, and ROC-AUC score of 0.97. For the Kvasir-Capsule (a large-scale VCE
dataset), our ViT model achieved a weighted F1-score of 0.7156, recall of
0.7182, MCC of 0.3705, and ROC-AUC score of 0.57. We found that our
transformer-based models were better or more effective than various CNN models
for classifying different anatomical structures, findings, and abnormalities.
Our model showed improvement over the CNN-based approaches and suggests that it
could be used as a new benchmarking algorithm for algorithm development.
Related papers
- Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - Classification of Endoscopy and Video Capsule Images using CNN-Transformer Model [1.0994755279455526]
This study proposes a hybrid model that combines the advantages of Transformers and Convolutional Neural Networks (CNNs) to enhance classification performance.
For the GastroVision dataset, our proposed model demonstrates excellent performance with Precision, Recall, F1 score, Accuracy, and Matthews Correlation Coefficient (MCC) of 0.8320, 0.8386, 0.8324, 0.8386, and 0.8191, respectively.
arXiv Detail & Related papers (2024-08-20T11:05:32Z) - TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images [62.53931644063323]
In this study we extended the capabilities of TotalSegmentator to MR images.
We trained an nnU-Net segmentation algorithm on this dataset and calculated similarity coefficients (Dice) to evaluate the model's performance.
The model significantly outperformed two other publicly available segmentation models (Dice score 0.824 versus 0.762; p0.001 and 0.762 versus 0.542; p)
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging [1.6040478776985583]
This study focuses on using convolutional neural networks (CNNs) for COVID-19 diagnosis using computed tomography (CT) and chest radiography (CXR)
We developed and tested multiple AI models, 3D ResNet-like and 2D EfficientNetv2 architectures, across diverse datasets.
Models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR.
arXiv Detail & Related papers (2023-08-17T19:12:32Z) - Using Multiple Dermoscopic Photographs of One Lesion Improves Melanoma
Classification via Deep Learning: A Prognostic Diagnostic Accuracy Study [0.0]
This study evaluated the impact of multiple real-world dermoscopic views of a single lesion of interest on a CNN-based melanoma classifier.
Using multiple real-world images is an inexpensive method to positively impact the performance of a CNN-based melanoma classifier.
arXiv Detail & Related papers (2023-06-05T11:55:57Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.