Related papers: Deep Learning for Oral Health: Benchmarking ViT, DeiT, BEiT, ConvNeXt, and Swin Transformer

Deep Learning for Oral Health: Benchmarking ViT, DeiT, BEiT, ConvNeXt, and Swin Transformer

URL: http://arxiv.org/abs/2509.23100v1
Date: Sat, 27 Sep 2025 04:17:04 GMT
Title: Deep Learning for Oral Health: Benchmarking ViT, DeiT, BEiT, ConvNeXt, and Swin Transformer
Authors: Ajo Babu George, Sadhvik Bathini, Niranjana S R,
Abstract summary: The study specifically focused on addressing real-world challenges such as data imbalance.<n> ConvNeXt, Swin Transformer, and BEiT showed reliable diagnostic performance.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Objective: The aim of this study was to systematically evaluate and compare the performance of five state-of-the-art transformer-based architectures - Vision Transformer (ViT), Data-efficient Image Transformer (DeiT), ConvNeXt, Swin Transformer, and Bidirectional Encoder Representation from Image Transformers (BEiT) - for multi-class dental disease classification. The study specifically focused on addressing real-world challenges such as data imbalance, which is often overlooked in existing literature. Study Design: The Oral Diseases dataset was used to train and validate the selected models. Performance metrics, including validation accuracy, precision, recall, and F1-score, were measured, with special emphasis on how well each architecture managed imbalanced classes. Results: ConvNeXt achieved the highest validation accuracy at 81.06, followed by BEiT at 80.00 and Swin Transformer at 79.73, all demonstrating strong F1-scores. ViT and DeiT achieved accuracies of 79.37 and 78.79, respectively, but both struggled particularly with Caries-related classes. Conclusions: ConvNeXt, Swin Transformer, and BEiT showed reliable diagnostic performance, making them promising candidates for clinical application in dental imaging. These findings provide guidance for model selection in future AI-driven oral disease diagnostic tools and highlight the importance of addressing data imbalance in real-world scenarios

Related papers

Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects [42.465094107111646]
This study evaluates the efficacy of vision transformer models, specifically Swin transformers, in enhancing the diagnostic accuracy of ear diseases.<n>The research utilised a real-world dataset from the Department of Otolaryngology at the Clinical Hospital of the Universidad de Chile.
arXiv Detail & Related papers (2025-11-06T23:20:37Z)
GastroViT: A Vision Transformer Based Ensemble Learning Approach for Gastrointestinal Disease Classification with Grad CAM & SHAP Visualization [6.752543644823974]
This paper presents an ensemble of pre-trained vision transformers (ViTs) for accurately classifying endoscopic images of the GI tract.<n>ViTs, attention-based neural networks, have revolutionized image recognition by leveraging the transformative power of the transformer architecture.<n>The proposed model was evaluated on the publicly available HyperKvasir dataset with 10,662 images of 23 different GI diseases.
arXiv Detail & Related papers (2025-09-30T16:44:41Z)
Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers [0.0]
This paper presents a comparison of the performance of CNNs and ViTs in the task of multi-classifying images containing lesions of psoriasis and diseases similar to it.<n>The ViTs stood out for their superior performance with smaller models.<n>This article reinforces the potential of ViTs for medical image classification tasks.
arXiv Detail & Related papers (2025-06-11T19:00:32Z)
Explainable AI-Driven Detection of Human Monkeypox Using Deep Learning and Vision Transformers: A Comprehensive Analysis [0.20482269513546453]
mpox is a zoonotic viral illness that poses a significant public health concern.<n>It is difficult to make an early clinical diagnosis because of how closely its symptoms match those of measles and chickenpox.<n>Medical imaging combined with deep learning (DL) techniques has shown promise in improving disease detection by analyzing affected skin areas.<n>Our study explore the feasibility to train deep learning and vision transformer-based models from scratch with publicly available skin lesion image dataset.
arXiv Detail & Related papers (2025-04-03T19:45:22Z)
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z)
Enhancing Skin Disease Classification Leveraging Transformer-based Deep Learning Architectures and Explainable AI [2.3149142745203326]
Skin diseases affect over a third of the global population, yet their impact is often underestimated. Deep learning techniques have shown much promise for various tasks, including dermatological disease identification. This study uses a skin disease dataset with 31 classes and compares it with all versions of Vision Transformers, Swin Transformers and DivoV2.
arXiv Detail & Related papers (2024-07-20T05:38:00Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification. Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z)
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data [0.0]
The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes. We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
arXiv Detail & Related papers (2022-03-25T19:42:03Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.